From JRijnberk at wanadoo.nl  Sat Jan  1 00:01:08 2005
From: JRijnberk at wanadoo.nl (Hans van Rijnberk , Assort Vision, Utrecht)
Date: Fri Dec 31 23:57:42 2004
Subject: [Wekalist] Is there any class about ROC and AUC?
Message-ID: <2.2b9.32.20041231110108.008feea4@pop.cablewanadoo.nl>

Hi

Look for class ThresholdCurve.
AUC is provided by method getROCArea. This procedure uses the trapezoid rule
for area estimation. This is 100% equivalent to the Mann-Whitney U procedure.
Pay attention: a normal ROC with points (Hit%, Miss%) includes (0,0) (=
reject all instances) and (100,100) (= accept all instances) while weka
exclude these boudaries. 

Hans van Rijnberk
Assort Vision


At 16:36 30/12/04 -0700, lei tang wrote:
>I am going to evaluate the classifier by AUC.  Is there any class
>about ROC or AUC in weka?
>I tried to find it, but in vain. 
>
>
>Thanks!
>Lei
>
>_______________________________________________
>Wekalist mailing list
>Wekalist@list.scms.waikato.ac.nz
>https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>
>


Hans van Rijnberk

JRijnberk@wanadoo.nl 



From abendav at netvision.net.il  Mon Jan  3 19:55:05 2005
From: abendav at netvision.net.il (Dr. Arie Ben David)
Date: Mon Jan  3 19:54:22 2005
Subject: [Wekalist] WEKA Documentation
Message-ID: <001701c4f161$2a45dd80$c9a284d9@asus>

Hi everyone
I am considering using WEKA as a software tool for an undergraduate course in machine learning (we currently use Clementine). Can you kindly recommend a web site where students can find  theoretical background, updated description, examples, bibliography, etc of  all (or most) models which are currently used in WEKA  (I am not talking about object level details). 
Thank you 
Happy New Year
Dr. Arie Ben David


-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050103/de4508f0/attachment.htm
From tpederse at d.umn.edu  Mon Jan  3 20:13:05 2005
From: tpederse at d.umn.edu (ted pedersen)
Date: Mon Jan  3 20:14:09 2005
Subject: [Wekalist] WEKA Documentation
In-Reply-To: <001701c4f161$2a45dd80$c9a284d9@asus>
References: <001701c4f161$2a45dd80$c9a284d9@asus>
Message-ID: <Pine.GSO.4.58.0501030104340.5359@csdev18.d.umn.edu>


I recommend the book :

Data Mining: Practical Machine Learning Tools and Techniques with Java
Implementations by Ian H. Witten, Eibe Frank
http://www.cs.waikato.ac.nz/~ml/weka/book.html

Yes, I know this is sort of obvious and maybe not what you think you want,
but if you are teaching an undergrad class in machine learning, you really
do want this book. It's great. It's clear, it's concise, and it's even
sort of fun. I routinely refer students who are new to machine learning to
this book and they like it - they can understand it and it doesn't even
cost too much (compared to other Machine Learning books that shall remain
nameless ;) Besides, it's written with Weka in mind. It may not include
all the latest bells and whistles in Weka, but in an undergrad class
you'll probably be dealing with decision trees and Naive Bayesian
classifiers, etc. rather than freaky kernels and the like.

Also, I think Weka is an excellent choice for a classroom tool. It's
stable, easy to use, and has lots of room for growth. So it doesn't limit
very bright or ambitious students, while not being impossible for the more
average ones.

Cordially,
Ted

On Mon, 3 Jan 2005, Dr. Arie Ben David wrote:

> Hi everyone
> I am considering using WEKA as a software tool for an undergraduate course in machine learning (we currently use Clementine). Can you kindly recommend a web site where students can find  theoretical background, updated description, examples, bibliography, etc of  all (or most) models which are currently used in WEKA  (I am not talking about object level details).
> Thank you
> Happy New Year
> Dr. Arie Ben David
>
>
>

--
Ted Pedersen
http://www.d.umn.edu/~tpederse

From leoraffael at yahoo.com.br  Tue Jan  4 04:15:40 2005
From: leoraffael at yahoo.com.br (Leo)
Date: Tue Jan  4 05:18:58 2005
Subject: [Wekalist] CrossValidation Training
Message-ID: <001301c4f1a7$194e2280$77b7fea9@ospaivas>

Thanks for the previous answer, and forgive my english.

I used the multilayerperceptron classifier, with 10-fold-crossvalidation. Nine folds should be training sets(training_instances), and 1 the testing set(testing_instances), and this process is made 10 times, with different key_folds(testing fold), right?

My questions, about the resuts at the CSV file:
The Percent_Incorrect is the error occurred at the testing_instances?
How do i know the error that occurred at the training_instances? It's possible?
What is the training error at this csv file, after all?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050103/b8aa925e/attachment.htm
From leunamelarama at yahoo.com.br  Tue Jan  4 06:01:43 2005
From: leunamelarama at yahoo.com.br (Emanuel Amaral Schimidt)
Date: Tue Jan  4 06:01:55 2005
Subject: [Wekalist] Saving results
Message-ID: <20050103170143.68293.qmail@web52405.mail.yahoo.com>

Hi everyone!

It's my first message on the list, and I'm a new user
of Weka.

I would like to save the trained set, and recover it
again. By this way, I don't have to run the data
everytime.

But...I didn't find how to do this. Can you help me,
please? How can I do that? Or Weka doesn't has this
feature?

Sincerely

Emanuel



__________________________________________________
Converse com seus amigos em tempo real com o Yahoo! Messenger 
http://br.download.yahoo.com/messenger/ 

From leunamelarama at yahoo.com.br  Tue Jan  4 08:56:52 2005
From: leunamelarama at yahoo.com.br (Emanuel Amaral Schimidt)
Date: Tue Jan  4 08:56:57 2005
Subject: [Wekalist] Saving results
Message-ID: <20050103195652.25623.qmail@web52405.mail.yahoo.com>

I'm using Weka in my code (linking to the algorithms).
I print the results, but I would like to save this
results and reuse it on future minings (then I won't
need to mine de database again).

I hope you understand my poor english. Please, correct
me, than I can learn :-)

Thanks!

Emanuel

 --- David <pythonner@gmail.com> escreveu: 
> Hello,
> 
> are you running Weka through the GUI or are you
> playing in the code directly?
> 
> David
> 


	
	
		
_______________________________________________________ 
Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis

From glassner at umiacs.umd.edu  Tue Jan  4 09:25:05 2005
From: glassner at umiacs.umd.edu (Grazia Russo-Lassner)
Date: Tue Jan  4 09:25:19 2005
Subject: [Wekalist] Saving results
In-Reply-To: <20050103195652.25623.qmail@web52405.mail.yahoo.com>
References: <20050103195652.25623.qmail@web52405.mail.yahoo.com>
Message-ID: <Pine.GSO.4.61.0501031524050.13595@circle.umiacs.umd.edu>


If I am not mistaken, a flag on the command line (-o) allows you to 
specify the output filename.

Grazia


On Mon, 3 Jan 2005, Emanuel Amaral Schimidt wrote:

> I'm using Weka in my code (linking to the algorithms).
> I print the results, but I would like to save this
> results and reuse it on future minings (then I won't
> need to mine de database again).
>
> I hope you understand my poor english. Please, correct
> me, than I can learn :-)
>
> Thanks!
>
> Emanuel
>
> --- David <pythonner@gmail.com> escreveu:
>> Hello,
>>
>> are you running Weka through the GUI or are you
>> playing in the code directly?
>>
>> David
>>
>
>
>
>
>
> _______________________________________________________
> Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
>
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>
From glassner at umiacs.umd.edu  Wed Jan  5 08:16:09 2005
From: glassner at umiacs.umd.edu (Grazia Russo-Lassner)
Date: Wed Jan  5 08:16:24 2005
Subject: [Wekalist] Saving results
In-Reply-To: <20050104181456.86971.qmail@web52409.mail.yahoo.com>
References: <20050104181456.86971.qmail@web52409.mail.yahoo.com>
Message-ID: <Pine.GSO.4.61.0501041407580.27481@circle.umiacs.umd.edu>


Yes, you can do it.

For instance, in a Perl script
$cmd = "nice /path-to-java/java -cp /path-to-weka.jar/weka.jar -Dfile.encoding=ISO8859-1 weka.classifiers.trees.j48.J48 (or any other classifier you are using) -t train_file -d output-file-for-model -T test-file > filename-in-which-to-save-statistics";
system($cmd);
With the -d option you are saving your model in a file and you can call it 
when you need it.

look in Chapter 8 (page 296) of Data Mining by Witten and Frank.

Grazia

On Tue, 4 Jan 2005, Emanuel Amaral Schimidt wrote:

> Hello Grazia!
>
> Thanks for your answer!
>
> I would like to know if I can reuse the learned model
> in Weka, tell weka that this is a model that have
> already learned, then I won't need to run the model
> again each time (I use the saved one).
>
> I'm using weka in my own code, so, how could I save
> and  call the saved model by it?
>
> Thanks for your help and time
>
> Emanuel
>
> --- Grazia Russo-Lassner <glassner@umiacs.umd.edu>
> escreveu:
>>
>> If I am not mistaken, a flag on the command line
>> (-o) allows you to
>> specify the output filename.
>>
>> Grazia
>>
>>
>> On Mon, 3 Jan 2005, Emanuel Amaral Schimidt wrote:
>>
>>> I'm using Weka in my code (linking to the
>> algorithms).
>>> I print the results, but I would like to save this
>>> results and reuse it on future minings (then I
>> won't
>>> need to mine de database again).
>>>
>>> I hope you understand my poor english. Please,
>> correct
>>> me, than I can learn :-)
>>>
>>> Thanks!
>>>
>>> Emanuel
>>>
>>> --- David <pythonner@gmail.com> escreveu:
>>>> Hello,
>>>>
>>>> are you running Weka through the GUI or are you
>>>> playing in the code directly?
>>>>
>>>> David
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
> _______________________________________________________
>>> Yahoo! Acesso Gr?tis - Instale o discador do
>> Yahoo! agora. http://br.acesso.yahoo.com/ - Internet
>> r?pida e gr?tis
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Wekalist@list.scms.waikato.ac.nz
>>>
>>
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>
>
> __________________________________________________
> Converse com seus amigos em tempo real com o Yahoo! Messenger
> http://br.download.yahoo.com/messenger/
>
From eibe at cs.waikato.ac.nz  Wed Jan  5 10:55:52 2005
From: eibe at cs.waikato.ac.nz (Eibe Frank)
Date: Wed Jan  5 10:56:02 2005
Subject: [Wekalist] Re: Wekalist Digest, Vol 22, Issue 24
In-Reply-To: <1104055383.41ce8c57e9dfe@rmc60-231.urz.tu-dresden.de>
References: <1104055383.41ce8c57e9dfe@rmc60-231.urz.tu-dresden.de>
Message-ID: <670F8644-5E9B-11D9-AB57-000A959DE03E@cs.waikato.ac.nz>


On Dec 26, 2004, at 11:03 PM, Samatha Kottha wrote:

>> The Resample filters in Weka perform sampling WITH replacement.
>> (Unfortunately the documentation hasn't been very clear about that but
>> we have fixed it recently in CVS.)
>>
>
> I am using weka for trainig and testing for C45, but for SVM I am 
> using LibSVM.
> So, till now I am using the resample to generate the random training 
> set and
> saving it and using that saved set for LibSVM. Does the resample with
> uniformbiastoclass as false(0) also does the replacement. When I 
> checked it, it
> does not chainging the proportion of class that much.

Yes, Resample always does sampling with replacement. So you might end 
up with duplicate instances. However, the per-class proportions should 
be similar to the original data if you set that parameter to zero.

> We have holidays till the first week of January, I will
> send you the data some time in second week.

Great, thanks.

Cheers,
Eibe


From cplyon928 at comcast.net  Wed Jan  5 17:15:02 2005
From: cplyon928 at comcast.net (Clifford Lyon)
Date: Wed Jan  5 17:15:08 2005
Subject: [Wekalist] Factor Analysis
In-Reply-To: <670F8644-5E9B-11D9-AB57-000A959DE03E@cs.waikato.ac.nz>
References: <1104055383.41ce8c57e9dfe@rmc60-231.urz.tu-dresden.de>
	<670F8644-5E9B-11D9-AB57-000A959DE03E@cs.waikato.ac.nz>
Message-ID: <41DB69C6.7000307@comcast.net>

Hi, is there anyone on the list who has used Weka for factor analysis? 
I want to write a factor analysis class, probably maximum likelihood, 
but don't want to reinvent the wheel.

Thanks for any hints.

From zamanbaber at gmail.com  Thu Jan  6 03:29:11 2005
From: zamanbaber at gmail.com (Baber Zaman)
Date: Thu Jan  6 03:29:18 2005
Subject: [Wekalist] Text Classification And Weka/Judge
Message-ID: <d92c04ae050105062956367888@mail.gmail.com>

Hello All,

I am quite new in using Weka and need help to classify text.

Can any body help me how can I  use Weka / Judge  for text mining.

I want to perform text mining on large collection of documents. So I
would like to have

Stop words removal
Stemming
TF x IDF weighting
KNN Classification

Can some body provide me some example of using Text Classification.

Secondly I want to build the feature matrix and store it in some way,
and use this feature matrix to classify new documents in future .

Can any body help me in this regard, how can I get the feature matrix
and store it and later build classifier using this matrix.

Thanx in advance.





-- 
Baber Zaman
Master Student Software Systems Engineering
Aachen University Of Technology 
Germany.
Phone : (+)49-288-3066118
            (+)49-179-1489662(Handy)

From david_parker at clear.net.nz  Thu Jan  6 06:29:25 2005
From: david_parker at clear.net.nz (David Parker)
Date: Thu Jan  6 06:30:14 2005
Subject: [Wekalist] simple CLI
Message-ID: <001d01c4f34c$38241790$9faba7cb@healthotago.co.nz>

When I type the following at the simple CLI prompt :

java weka.core.Instances data/soybean.arff

I get the followng response:

java.lang.Exception: Usage: Instances <filename>
 at weka.core.Instances.main(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at weka.gui.SimpleCLI$ClassRunner.run(Unknown Source)
Usage: Instances <filename>

This suggests to me that the program cannot find the arff file. I installed
weka using the self-installer on Windows.

I have also extracted the weka-src.jar file and I can compile run the
resulting weka.core.Instances program as above from the dos prompt, but this
is much less convenient than using the simple CLI. Can anyone tell me how to
get the simple CLI working properly? Thanks.


From david_parker at clear.net.nz  Thu Jan  6 11:00:29 2005
From: david_parker at clear.net.nz (David Parker)
Date: Thu Jan  6 11:01:12 2005
Subject: [Wekalist] Evaluation.evaluateModel() options
Message-ID: <000701c4f372$0e978990$227565da@healthotago.co.nz>

The following class is supposed to train a classifier using training.arff
then output a reclassified version of test.arff.

public class Test {
 private static String options[] = {"-t \"data\\training.arff\"",
                    "-T \"data\\test.arff\"",
                    "-p 0 2"};

  public static void main(String[] ops) {

     try {
       Instances i = new Instances(new FileReader("data/training.arff"));
    i.setClassIndex(i.numAttributes() - 1);
    J48 j = new J48();
    j.buildClassifier(i);

    Evaluation evaluation = new Evaluation(i);
    System.out.println(evaluation.evaluateModel(j, options));
     } catch (Exception e) {
       e.printStackTrace();
     }
  }
}

I get an exception:

java.lang.Exception:
Weka exception: Illegal option: -t "data\training.arff"

Why is this an illegal option?


From glassner at umiacs.umd.edu  Thu Jan  6 11:59:12 2005
From: glassner at umiacs.umd.edu (Grazia Russo-Lassner)
Date: Thu Jan  6 11:59:43 2005
Subject: [Wekalist] Evaluation.evaluateModel() options
In-Reply-To: <000701c4f372$0e978990$227565da@healthotago.co.nz>
References: <000701c4f372$0e978990$227565da@healthotago.co.nz>
Message-ID: <Pine.GSO.4.61.0501051758410.13464@circle.umiacs.umd.edu>


Try to pass the complete path to the file.



On Thu, 6 Jan 2005, David Parker wrote:

> The following class is supposed to train a classifier using training.arff
> then output a reclassified version of test.arff.
>
> public class Test {
> private static String options[] = {"-t \"data\\training.arff\"",
>                    "-T \"data\\test.arff\"",
>                    "-p 0 2"};
>
>  public static void main(String[] ops) {
>
>     try {
>       Instances i = new Instances(new FileReader("data/training.arff"));
>    i.setClassIndex(i.numAttributes() - 1);
>    J48 j = new J48();
>    j.buildClassifier(i);
>
>    Evaluation evaluation = new Evaluation(i);
>    System.out.println(evaluation.evaluateModel(j, options));
>     } catch (Exception e) {
>       e.printStackTrace();
>     }
>  }
> }
>
> I get an exception:
>
> java.lang.Exception:
> Weka exception: Illegal option: -t "data\training.arff"
>
> Why is this an illegal option?
>
>
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>

From eibe at cs.waikato.ac.nz  Thu Jan  6 13:35:36 2005
From: eibe at cs.waikato.ac.nz (Eibe Frank)
Date: Thu Jan  6 13:35:39 2005
Subject: [Wekalist] Evaluation.evaluateModel() options
In-Reply-To: <E1CmKOX-0002NR-AD@ghoul.scms.waikato.ac.nz>
References: <E1CmKOX-0002NR-AD@ghoul.scms.waikato.ac.nz>
Message-ID: <E1F665BF-5F7A-11D9-A9A4-000A959DE03E@cs.waikato.ac.nz>

Hi David,

Try something like

> private static String options[] = {"-t", "data\\training.arff",
>                     "-T", "data\\test.arff",
>                     "-p", "1-2"};

This should eliminate the exception. However, it won't give you an ARFF 
file. The format of the data output by the -p option is not ARFF.

Cheers,
Eibe

On Jan 6, 2005, at 12:17 PM, wekalist-request@list.scms.waikato.ac.nz 
wrote:

> The following class is supposed to train a classifier using 
> training.arff
> then output a reclassified version of test.arff.
>
> public class Test {
>  private static String options[] = {"-t \"data\\training.arff\"",
>                     "-T \"data\\test.arff\"",
>                     "-p 0 2"};
>
>   public static void main(String[] ops) {
>
>      try {
>        Instances i = new Instances(new 
> FileReader("data/training.arff"));
>     i.setClassIndex(i.numAttributes() - 1);
>     J48 j = new J48();
>     j.buildClassifier(i);
>
>     Evaluation evaluation = new Evaluation(i);
>     System.out.println(evaluation.evaluateModel(j, options));
>      } catch (Exception e) {
>        e.printStackTrace();
>      }
>   }
> }
>
> I get an exception:
>
> java.lang.Exception:
> Weka exception: Illegal option: -t "data\training.arff"
>
> Why is this an illegal option?


From lzhu at deakin.edu.au  Thu Jan  6 18:38:58 2005
From: lzhu at deakin.edu.au (Ling Zhuang)
Date: Thu Jan  6 18:39:30 2005
Subject: [Wekalist] Naive Bayes Running problem
Message-ID: <5.1.1.5.2.20050106163511.03fe1ec0@mail.deakin.edu.au>


Hi, All

I am running weka's Naive Bayes Multinomial program 
(weka.classifiers.bayes.NaiveBayesMultinomial) on a data set created 
myself. However, it gives me this exception all this time:

java.lang.IllegalArgumentException: Can't normalize array. Sum is NaN.
        at weka.core.Utils.normalize(Unknown Source)
        at 
weka.classifiers.bayes.NaiveBayesMultinomial.distributionForInstance(Unknown 
Source)
        at weka.classifiers.Evaluation.evaluateModelOnce(Unknown Source)
        at weka.classifiers.Evaluation.evaluateModel(Unknown Source)
        at weka.classifiers.Evaluation.evaluateModel(Unknown Source)
        at weka.classifiers.bayes.NaiveBayesMultinomial.main(Unknown Source)
Can't normalize array. Sum is NaN.

And this only happens when I try to use 
weka.classifiers.bayes.NaiveBayesMultinomial, I have tried J48 and it works 
all right. Does anyone know why does this happen??

Thank you in advance!

Cheers.

Ling Zhuang


From ozric at web.de  Thu Jan  6 20:24:56 2005
From: ozric at web.de (Christian Schulz)
Date: Thu Jan  6 20:24:33 2005
Subject: [Wekalist] simple CLI
In-Reply-To: <001d01c4f34c$38241790$9faba7cb@healthotago.co.nz>
References: <001d01c4f34c$38241790$9faba7cb@healthotago.co.nz>
Message-ID: <41DCE7C8.4070406@web.de>

Hmm curious - i get no success, too with cli in windows, but
from the dos-windows it works!?

java weka.core.Instances  ./data/soybean.arff

regards, christian


David Parker wrote:

>When I type the following at the simple CLI prompt :
>
>java weka.core.Instances data/soybean.arff
>
>I get the followng response:
>
>java.lang.Exception: Usage: Instances <filename>
> at weka.core.Instances.main(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at weka.gui.SimpleCLI$ClassRunner.run(Unknown Source)
>Usage: Instances <filename>
>
>This suggests to me that the program cannot find the arff file. I installed
>weka using the self-installer on Windows.
>
>I have also extracted the weka-src.jar file and I can compile run the
>resulting weka.core.Instances program as above from the dos prompt, but this
>is much less convenient than using the simple CLI. Can anyone tell me how to
>get the simple CLI working properly? Thanks.
>
>
>_______________________________________________
>Wekalist mailing list
>Wekalist@list.scms.waikato.ac.nz
>https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>
>  
>



From agetman at haremhills.com  Thu Jan  6 21:00:54 2005
From: agetman at haremhills.com (Anya Getman)
Date: Thu Jan  6 21:01:07 2005
Subject: [Wekalist] Markov Models?
In-Reply-To: <200501052323.j05NN3ME015396@ssl.bigtimeservers.com>
References: <200501052323.j05NN3ME015396@ssl.bigtimeservers.com>
Message-ID: <41DCF036.2080203@haremhills.com>

Does Weka have any Markov models, and can someone who's played with the 
neural nets chat with me offline on what capabilities they have (and if 
I wanted to add some functionality, how could I)?

Thanks!



From hien at pmail.ntu.edu.sg  Thu Jan  6 22:05:21 2005
From: hien at pmail.ntu.edu.sg (#NGUYEN VAN HIEN#)
Date: Thu Jan  6 22:03:53 2005
Subject: [Wekalist] MDLP dicretization
Message-ID: <E030192F65406648905385147031729E49A908@mail03.student.main.ntu.edu.sg>

Hi all,
I would like to know is there anyone who has implemented MDLP algorithm
of Fayyad for discretizing and run it with Iris dataset. I have
implemented the algorithm, tested it with Iris dataset, but don't know
the correctness of the result.
 
Regards
Nguyen Van Hien
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050106/74ffe407/attachment.htm
From eibe at cs.waikato.ac.nz  Fri Jan  7 12:25:10 2005
From: eibe at cs.waikato.ac.nz (Eibe Frank)
Date: Fri Jan  7 12:25:15 2005
Subject: [Wekalist] MDLP dicretization
In-Reply-To: <E1Cmgtv-0003o1-Sv@ghoul.scms.waikato.ac.nz>
References: <E1Cmgtv-0003o1-Sv@ghoul.scms.waikato.ac.nz>
Message-ID: <358ABEE4-603A-11D9-B5A4-000A959DE03E@cs.waikato.ac.nz>

Weka's weka.filters.supervised.attribute.Discretize filter implements 
Fayyad & Irani's MDL-based method.

Cheers,
Eibe


On Jan 7, 2005, at 12:18 PM, wekalist-request@list.scms.waikato.ac.nz 
wrote:

> From: "#NGUYEN VAN HIEN#" <hien@pmail.ntu.edu.sg>
> Date: January 6, 2005 10:05:21 PM GMT+13:00
> To: <wekalist@list.scms.waikato.ac.nz>
> Subject: [Wekalist] MDLP dicretization
>
>
> Hi all,
>
> I would like to know is there anyone who has implemented MDLP 
> algorithm of Fayyad for discretizing and run it with Iris dataset. I 
> have implemented the algorithm, tested it with Iris dataset, but don?t 
> know the correctness of the result.
>
> ?
>
> Regards
>
> Nguyen Van Hien
>


From eibe at cs.waikato.ac.nz  Fri Jan  7 12:32:14 2005
From: eibe at cs.waikato.ac.nz (Eibe Frank)
Date: Fri Jan  7 12:32:19 2005
Subject: [Wekalist] Naive Bayes Running problem
In-Reply-To: <E1Cmgtv-0003o1-Sv@ghoul.scms.waikato.ac.nz>
References: <E1Cmgtv-0003o1-Sv@ghoul.scms.waikato.ac.nz>
Message-ID: <31E98A18-603B-11D9-B5A4-000A959DE03E@cs.waikato.ac.nz>

What kind of data are you trying to run it on? I suspect the input data  
is not suitable for it (it's usually applied to data where the  
attributes represent word counts). However, you should probably get a  
more sensible exception.

All the attributes must have positive numeric values (or be zero) for  
it to work.

Cheers,
Eibe

On Jan 7, 2005, at 12:18 PM, wekalist-request@list.scms.waikato.ac.nz  
wrote:

> Hi, All
>
> I am running weka's Naive Bayes Multinomial program  
> (weka.classifiers.bayes.NaiveBayesMultinomial) on a data set created  
> myself. However, it gives me this exception all this time:
>
> java.lang.IllegalArgumentException: Can't normalize array. Sum is NaN.
>        at weka.core.Utils.normalize(Unknown Source)
>        at  
> weka.classifiers.bayes.NaiveBayesMultinomial.distributionForInstance(Un 
> known Source)
>        at weka.classifiers.Evaluation.evaluateModelOnce(Unknown Source)
>        at weka.classifiers.Evaluation.evaluateModel(Unknown Source)
>        at weka.classifiers.Evaluation.evaluateModel(Unknown Source)
>        at weka.classifiers.bayes.NaiveBayesMultinomial.main(Unknown  
> Source)
> Can't normalize array. Sum is NaN.
>
> And this only happens when I try to use  
> weka.classifiers.bayes.NaiveBayesMultinomial, I have tried J48 and it  
> works all right. Does anyone know why does this happen??
>
> Thank you in advance!
>
> Cheers.
>
> Ling Zhuang


From amk14 at cs.waikato.ac.nz  Fri Jan  7 12:50:50 2005
From: amk14 at cs.waikato.ac.nz (Ashraf Kibriya)
Date: Fri Jan  7 12:50:53 2005
Subject: [Wekalist] Naive Bayes Running problem
In-Reply-To: <E1Cmgxf-0003yU-Qu@ghoul.scms.waikato.ac.nz>
References: <E1Cmgxf-0003yU-Qu@ghoul.scms.waikato.ac.nz>
Message-ID: <41DDCEDA.1030907@cs.waikato.ac.nz>

Hi Ling,
What sort of dataset are you using? Are there any negative or missing 
values in there?
It shouldn't give this error if all the values, apart from class, are 
numeric, not missing(at least not in the test set), and greater than or 
equal to zero.


Kind Regards,
Ashraf

>------------------------------
>
>Message: 2
>Date: Thu, 06 Jan 2005 16:38:58 +1100
>From: Ling Zhuang <lzhu@deakin.edu.au>
>Subject: [Wekalist] Naive Bayes Running problem
>To: wekalist@list.scms.waikato.ac.nz
>Message-ID: <5.1.1.5.2.20050106163511.03fe1ec0@mail.deakin.edu.au>
>Content-Type: text/plain; charset="us-ascii"; format=flowed
>
>
>Hi, All
>
>I am running weka's Naive Bayes Multinomial program 
>(weka.classifiers.bayes.NaiveBayesMultinomial) on a data set created 
>myself. However, it gives me this exception all this time:
>
>java.lang.IllegalArgumentException: Can't normalize array. Sum is NaN.
>        at weka.core.Utils.normalize(Unknown Source)
>        at 
>weka.classifiers.bayes.NaiveBayesMultinomial.distributionForInstance(Unknown 
>Source)
>        at weka.classifiers.Evaluation.evaluateModelOnce(Unknown Source)
>        at weka.classifiers.Evaluation.evaluateModel(Unknown Source)
>        at weka.classifiers.Evaluation.evaluateModel(Unknown Source)
>        at weka.classifiers.bayes.NaiveBayesMultinomial.main(Unknown Source)
>Can't normalize array. Sum is NaN.
>
>And this only happens when I try to use 
>weka.classifiers.bayes.NaiveBayesMultinomial, I have tried J48 and it works 
>all right. Does anyone know why does this happen??
>
>Thank you in advance!
>
>Cheers.
>
>Ling Zhuang
>  
>

From cplyon928 at comcast.net  Sun Jan  9 11:41:11 2005
From: cplyon928 at comcast.net (Clifford Lyon)
Date: Sun Jan  9 11:41:15 2005
Subject: [Wekalist] PCA attribute selection
In-Reply-To: <41DDCEDA.1030907@cs.waikato.ac.nz>
References: <E1Cmgxf-0003yU-Qu@ghoul.scms.waikato.ac.nz>
	<41DDCEDA.1030907@cs.waikato.ac.nz>
Message-ID: <41E06187.9090007@comcast.net>

I notice when I select attributes w/PCA and the Ranker that it leaves 
the last attribute in the set untouched.  I assume Weka regards this as 
a class attribute.  I am doing unsupervised learning, so there is no 
"class" - how can I tell the explorer this?

tia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050108/7c6f328f/attachment.htm
From cplyon928 at comcast.net  Sun Jan  9 11:45:09 2005
From: cplyon928 at comcast.net (Clifford Lyon)
Date: Sun Jan  9 11:45:11 2005
Subject: [Wekalist] Re: PCA attribute selection
In-Reply-To: <41E06187.9090007@comcast.net>
References: <E1Cmgxf-0003yU-Qu@ghoul.scms.waikato.ac.nz>
	<41DDCEDA.1030907@cs.waikato.ac.nz> <41E06187.9090007@comcast.net>
Message-ID: <41E06275.20605@comcast.net>

Nevermind, got it.

Clifford Lyon wrote:

> I notice when I select attributes w/PCA and the Ranker that it leaves 
> the last attribute in the set untouched.  I assume Weka regards this 
> as a class attribute.  I am doing unsupervised learning, so there is 
> no "class" - how can I tell the explorer this?
>
> tia



From bthom at cs.hmc.edu  Mon Jan 10 07:31:42 2005
From: bthom at cs.hmc.edu (belinda thom)
Date: Mon Jan 10 07:30:57 2005
Subject: [Wekalist] filtering questions
Message-ID: <B58D6118-626C-11D9-BF8C-000D93ACC694@cs.hmc.edu>

Hi

It appears WEKA does have PCA capabilities. I'm writing to find out  
more about this and other preprocessing utilities. In particular, does  
WEKA support Factor analysis? Multidimensional scaling? Can rotations  
of components (to make them "simpler" in some sense) be handled?

Advice greatly appreciated,

--b


Dr. Belinda Thom
------------------------------------------------------------------------ 
---------------------------
http://www.cs.hmc.edu/~bthom                                             
        909-607-9662
Asst. Professor,  Computer Science                                       
      fax  607-8364
Harvey Mudd College                                                      
              1241 Olin Hall
1250 Dartmouth Ave,  Claremont,  CA,  91711                   physical  
address
301 E. 12th Street,  Claremont,  CA,  91711,  USA              mailing  
address


From sionep at xtra.co.nz  Mon Jan 10 14:35:14 2005
From: sionep at xtra.co.nz (sione)
Date: Mon Jan 10 14:31:29 2005
Subject: [Wekalist] Re: filtering questions
In-Reply-To: <20050109233809.EMRZ560.mta5-rme.xtra.co.nz@ghoul.scms.waikato.ac.nz>
References: <20050109233809.EMRZ560.mta5-rme.xtra.co.nz@ghoul.scms.waikato.ac.nz>
Message-ID: <41E1DBD2.20700@xtra.co.nz>


>It appears WEKA does have PCA capabilities. I'm writing to find out  
>more about this and other preprocessing utilities. In particular, does  
>WEKA support Factor analysis? Multidimensional scaling? Can rotations  
>of components (to make them "simpler" in some sense) be handled?
>
>Advice greatly appreciated,
>
>  
>
Hi All,

I thought that this PCA Java implementation might be of useful to this 
list or people who might want to
integrate it with WEKA. This implementation must use JAMA (Java Matrix 
Algebra) which can be downloaded
from the following link. The implmentation uses SVD 
(SingularValueDecomposition) which is available in JAMA.
The PCA class is easy to modify to fit in with anyone's work or any API:

http://math.nist.gov/javanumerics/jama/

I have not implemented Factor Analsysis (FA) , but I intend to do it at 
some stage. So if I write a Java FA, then
I will post it here for the list where anybody who is interested can use.

Cheers,
Sione.

---------------------------------------------------------------------------------------------------------------

/*
 * This method takes data X and returns a array of Matrices for:
 * 1) Principal Components of the rawData
 * 2) Z-Scores
 * 3) Eigenvalues of the covariance matrix of rawData (latent)
 * 4) Hotelling's T-squared statistic for each data point
 * 5) zeroMean of the raw data
 *
 * References:
 * ----------
 * -  "A User's Guide to Principal Components", by J. Edward Jackson,
 *     pub by John Wiley & Sons, Inc. 1991 , Chapter 1.
 *
 * -  "Applied Multivariate Techniques" , by S. Sharma,
 *     pub by Wiley, Chapter 4.
 *
 * -  "Independent Component Analysis" , by A. Hyvarinen, J.Karhunen & 
E. Orja,
 *     pub by John Wiley, Chapter 6.
 */


import  JAMA.*;

public class PCA {

  private Matrix rawData;
  private Matrix zeroMeanData;
  private Matrix pca;
  private Matrix zScores;
  private Matrix latent;
  private Matrix hotellingTSquared;

  public PCA() { }


  public static void main(String[] args) {
    int rows = 10;
    int cols = 7;

    //Create random matrix of size [rows x cols] and scaled by 30
    Matrix X = Matrix.random(rows,cols).arrayTimes(new 
Matrix(rows,cols,30));//new Matrix(data);
    System.out.println(" ");

    PCA principalComponents = new PCA(X);

    //main computation
    principalComponents.computePCA();

    Matrix pca = principalComponents.getPca();
    System.out.println("----- PCA -----");
    pca.print(4,4);

    System.out.println(" ");

    Matrix zScores = principalComponents.getZScores();
    System.out.println("----- Z-Scores -----");
    zScores.print(4,4);

    System.out.println(" ");

    Matrix latent = principalComponents.getLatent();
    System.out.println("----- Latent -----");
    latent.print(4,4);

    System.out.println(" ");

    Matrix tSquared = principalComponents.getHotellingTSquared();
    System.out.println("----- Hotellings T-Square -----");
    tSquared.print(4,4);
  }

  public void setRawData(Matrix rawData){
    this.rawData = rawData;
    zeroMeanData = null;
    pca = null;
    zScores = null;
    latent = null;
    hotellingTSquared = null;
   }

  public Matrix getRawData(){
    return rawData;
   }

  public Matrix getPca(){
    return pca;
   }

  public Matrix getZeroMeanData(){
    return zeroMeanData;
   }

  public Matrix getZScores(){
    return zScores;
   }

  public Matrix getLatent(){
    return latent;
   }

  public Matrix getHotellingTSquared(){
    return this.hotellingTSquared;
   }

  /**
   */
  public void computePCA(){
    Matrix X = rawData;
    int m = X.getRowDimension();
    int n = X.getColumnDimension();
    int rank = Math.min(m-1,n);

    //The following is to prevent 'out-of-bound array exception' thrown
    //in SingularValueDecomposition
    if((m+1)<n){
      throw new IllegalArgumentException("pca - data has more variables 
than observations (wide matrix or under-determined linear systems) :");
    }
    if(m<2){
      throw new IllegalArgumentException("pca - there must be at least 2 
observations (2 rows) in the data :");
    }

    Matrix average = mean(X);
    //center the data by first removing the mean (average)
    zeroMeanData = X.minus(tile(average,m,1));

    SingularValueDecomposition svd = new 
SingularValueDecomposition(zeroMeanData.arrayRightDivide(new 
Matrix(m,n,Math.sqrt((double)m - 1.0))));

    pca = svd.getV(); // PCA
    zScores  = zeroMeanData.times(pca); //Z-Scores
    Matrix temp = diag(svd.getS());
    latent    = temp.arrayTimes(temp); //Eigenvalues of the covariance 
matrix of X

    if(rank<n){
      latent       = mergeV(latent.getMatrix(0,rank-1,0,0) , new 
Matrix(n-rank,1));
      zScores.setMatrix(0,m-1,rank,n-1,new Matrix(m,n-rank,0.0D));
     }

    Matrix temp2 = latent.getMatrix(0,rank-1,0,0);
    Matrix ones  = new Matrix(temp2.getRowDimension(), 
temp2.getColumnDimension(),1.0D);

    temp = 
sqrt(ones.arrayRightDivide(temp2)).transpose().times(zScores.getMatrix(0,m-1,0,rank-1).transpose());
    hotellingTSquared = sum(temp.arrayTimes(temp)).transpose(); 
//Hotelling's T-squared statistic
  }



  /**
 * This method sums the elements of matrix 'S' along the column
 * @param S Matrix
 * @return Matrix
 */
private  Matrix sum(Matrix S){
 double[][] internal = S.getArray();
 double[][] summing = null;
 double temp = 0.0;

 int row = S.getRowDimension();
 int col = S.getColumnDimension();

 summing = new double[1][col];
 for(int j=0 ; j<col ; j++){
    for(int i=0 ; i<row ; i++){ temp += internal[i][j] ; }
    summing[0][j] = temp;
    temp = 0.0;
  }
  return new Matrix(summing);
}


  /**
   * Taking the square-roots of each entry of matrix X
   * @param X Matrix
   * @return Matrix
   */
  private Matrix sqrt(Matrix X){
    int m = X.getRowDimension();
    int n = X.getColumnDimension();
    double[][] xArray = X.getArray();
    Matrix R = new Matrix(m,n);
    double[][] C = R.getArray();
    for(int i=0; i<m; i++){
      for(int j=0; j<n; j++){
        C[i][j] = Math.sqrt(xArray[i][j]);
      }
    }
    return R;
  }

  /**
   * Merge two matrices vertically. If the 2 matrices have different 
column numbers,
   * an exception is thrown. Merging 'A' and 'B' lead to a larger matrix of:
   *      /   \
   *      | A |
   *  C = |   |
   *      | B |
   *      \   /
   *
   * @param A Matrix
   * @param B Matrix
   * @return Matrix
   */
  private Matrix mergeV(Matrix A, Matrix B){
    int m   = A.getRowDimension();
    int n   = A.getColumnDimension();
    int b_m = B.getRowDimension();
    int b_n = B.getColumnDimension();

    if(n!=b_n){
      throw new IllegalArgumentException(" mergeV : Matrix column 
dimensions must agree (same).");
     }
    int newRow = m+b_m;
    Matrix R = new Matrix(newRow,n);
    R.setMatrix(0,m-1,0,n-1,A);
    R.setMatrix(m,m+b_m-1,0,n-1,B);
    return R;
  }

  /**
   * Find the mean along each columns of Matrix X
   * @param X Matrix
   * @return Matrix
   */
  private Matrix mean(Matrix X){
    int rows = X.getRowDimension();
    int cols = X.getColumnDimension();
    double[][] xarray = X.getArray();
    Matrix R = new Matrix(1,cols);
    double[][] C = R.getArray();
    double sum = 0.0;
    for(int j=0; j<cols; j++){
      for(int i=0; i<rows; i++){
        sum += xarray[i][j];
      }
     C[0][j] = sum/((double)rows);
     sum = 0.0; //reset sum to zero
    }
   return R;
  }

  /**
   * Tiling of matrix X in [rowWise by colWise] dimension. Tiling 
creates a larger
   * matrix than the original data X. Example, if X is to be tiled in a 
[3 x 4] manner, then
   *     /            \
   *     | X  X  X  X |
   * C = | X  X  X  X |
   *     | X  X  X  X |
   *     \           /
   * @param X Matrix
   * @param rowWise int
   * @param colWise int
   * @return Matrix
   */
  private Matrix tile(Matrix X, int rowWise, int colWise){
     double[][] xArray = X.getArray() ;
     int countRow = 0, countColumn = 0;
     int m = X.getRowDimension();
     int n = X.getColumnDimension();

     if( rowWise<1 || colWise<1 ){
       throw new ArrayIndexOutOfBoundsException("tile : Array index is 
out-of-bound.");
      }

     int newRowDim = m*rowWise;
     int newColDim = n*colWise;
     double[][] result = new double[newRowDim][];

     for(int i=0 ; i<newRowDim ; i++){
       double[] holder = new double[newColDim];
       for(int j=0 ; j<newColDim ; j++){
          holder[j] = xArray[countRow][countColumn++];
          //reset the column-index to zero to avoid reference to 
out-of-bound index in xArray[][]
          if(countColumn == n){ countColumn = 0; }
         }//end for
       countRow++;
       //reset the row-index to zero to avoid reference to out-of-bound 
index in xArray[][]
       if(countRow == m){ countRow = 0; }
       result[i] = holder;
     }//end for

      return new Matrix(result);
   }

   /**
    * Return a column vector matrix where its elements are the main 
diagonals of X
    * @param X Matrix
    * @return Matrix
    */
   private Matrix diag(Matrix X){
    int rows = X.getRowDimension();
    int cols = X.getColumnDimension();

    double[][] xArray = X.getArray();
    int minDim = Math.min(rows,cols);

    Matrix R = new Matrix(minDim,1);
    double[][] C = R.getArray();

    for(int i=0; i<minDim; i++){
      C[i][0] = xArray[i][i];
    }
    return R;
  }



}//------------------------ End Class Definition 
-------------------------------



From sionep at xtra.co.nz  Mon Jan 10 14:53:13 2005
From: sionep at xtra.co.nz (sione)
Date: Mon Jan 10 14:49:10 2005
Subject: [Wekalist] Re: filtering questions
In-Reply-To: <20050109233809.EMRZ560.mta5-rme.xtra.co.nz@ghoul.scms.waikato.ac.nz>
References: <20050109233809.EMRZ560.mta5-rme.xtra.co.nz@ghoul.scms.waikato.ac.nz>
Message-ID: <41E1E009.3050800@xtra.co.nz>


Oops, I made a mistake. Please note that the PCA class in my previous 
email should
have the constructor written as the following:

public PCA(Matrix X) {
   this.rawData = X;
  }

but not as :

public PCA( ) { }

Cheers,
Sione.


From javier at azulcielo.com  Mon Jan 10 15:05:23 2005
From: javier at azulcielo.com (Javier Albarracin)
Date: Mon Jan 10 15:07:19 2005
Subject: [Wekalist] Books or Webpages
In-Reply-To: <20050109233634.0323641991@mail01.powweb.com>
Message-ID: <20050110020721.8C1B441446@mail01.powweb.com>

Hello,

I see lots of messages about the technical part or Java part of Weka. Can
somebody refer some good text books (or web pages) that can guide me on
understanding the classifier and cluster algorithms built on Weka?

I already own the Datamining book written by Ian H. Witten and Eibe Frank
(which is a wonderful introduction to machine learning algorithms) but what
I am searching is another source of information on all the included
algorithms on Weka.

Thank you in Advance,
Javier Albarrac?n
Lima, Per?


From hien at pmail.ntu.edu.sg  Mon Jan 10 17:59:59 2005
From: hien at pmail.ntu.edu.sg (#NGUYEN VAN HIEN#)
Date: Mon Jan 10 17:57:58 2005
Subject: [Wekalist] discretization
Message-ID: <E030192F65406648905385147031729E49A912@mail03.student.main.ntu.edu.sg>

Dear all,
I'm implementing MDLP discretization algorithm (Fayyad 1993) by C
language. After discretizing, it is possible that the problem of
redundancy and inconsistency may occur. 
For example:
+ Redundancy: 
            F1        F2        F3      F3        Class
            0          0        0       1          1
            0          0        0       1          1

+ Inconsistency:
		F1        F2        F3        F3        Class
            0          0        0          1          1
            0          0        0          1          3

I would like to ask you how to deal with these issues.
Regards
Nguyen Van Hien

 



From hien at pmail.ntu.edu.sg  Mon Jan 10 19:52:53 2005
From: hien at pmail.ntu.edu.sg (#NGUYEN VAN HIEN#)
Date: Mon Jan 10 19:51:27 2005
Subject: [Wekalist] how to use C4.5 algorithm for classifying
Message-ID: <E030192F65406648905385147031729E49A913@mail03.student.main.ntu.edu.sg>


Hi Mr.Eibe Frank,

I'm a new comer in weka. I don't know how to use Weka software. Would
you mind answering me 2 questions?

+ If I have a continuous dataset, what steps should I do to discretize
it using Weka software (without writing code)?

Format of the continuous dataset (text file), for example:
	F1	F2	F3	Class
	0.1	0.2	1.3	2
	....
+ If I have a discrete dataset, what steps should I do to run C4.5
classification algorithm on it using Weka software (without writing
code)?

Format of the discrete dataset (text file), for example:
	F1	F2	F3	Class
	1	2	3	2
	...
If you have any documents guiding about these above problems, please
send to me. 

Thank you 

Regards
Nguyen Van Hien

From stijn.lievens at ugent.be  Mon Jan 10 23:30:20 2005
From: stijn.lievens at ugent.be (Stijn Lievens)
Date: Mon Jan 10 23:30:39 2005
Subject: [Wekalist] (un)expected behaviour of Experimenter?
Message-ID: <41E2593C.9030106@ugent.be>

Hi Weka users and implementers,

Just now, I noticed the following -- in my opinion -- strange behaviour 
of the experimenter.

I did the following (I'm working with Linux)

cd $HOME
java weka.gui.experiment.Experimenter
<configure advanced experiment>
save experiment under $HOME/other_dir

I thus created and saved and experiment in an *.exp file.

Now, when I tried

cd $HOME/other_dir
java weka.gui.experiment.Experimenter
<try to open an advanced experiment>

this failed, and the experiment could not be opened.

When I tried

cd $HOME
java weka.gui.experiment.Experimenter
<try to open an advanced experiment>

then everything worked out nicely.


In short, it looks like if one wants to reuse and existing experiment, 
one has to start the Experimenter from exactly the same location,
which looks like a serious limitation to me.  Also, (I have not yet 
tested this) will this not limit the possibility of configuring an 
experiment of one machine, and executing it (after copying) on another 
machine.

Kind regards,

Stijn Lievens.


-- 
==========================================================================
Dept. of Applied Mathematics and Computer Science, University of Ghent
Krijgslaan 281 - S9, B - 9000 Ghent, Belgium
E-mail: Stijn.Lievens@ugent.be, URL: http://allserv.ugent.be/~slievens/
==========================================================================

From saunier at enst.fr  Tue Jan 11 07:17:39 2005
From: saunier at enst.fr (Saunier Nicolas)
Date: Tue Jan 11 07:18:27 2005
Subject: [Wekalist] T-paired tests
Message-ID: <E9A26208-6333-11D9-9102-000A95BB8B1A@enst.fr>

Dear weka users,

I would like to evaluate the statistical significance of some results, 
and the t paired test (T-Test statistics) seems to be implemented in 
weka, but I don't know how to use it in my own code. Could someone 
explain me how to do it, or give me some link/reference on how to 
implement a simple statistical test, for, say, 10 trials of 10 fold 
cross-validation ?
Thanks for any help,

Nicolas Saunier
--
http://www.infres.enst.fr/~saunier


From pythonner at gmail.com  Tue Jan 11 08:56:07 2005
From: pythonner at gmail.com (David)
Date: Tue Jan 11 08:56:14 2005
Subject: [Wekalist] T-paired tests
In-Reply-To: <E9A26208-6333-11D9-9102-000A95BB8B1A@enst.fr>
References: <E9A26208-6333-11D9-9102-000A95BB8B1A@enst.fr>
Message-ID: <a39a6670050110115653200c7b@mail.gmail.com>

Hello,

I would say the best explanations can be found on this page:

http://home.clara.net/sisa/instr.htm

Practically, this is handy to use Microsoft Excel to compute the t-test.
You can simply use the built-in "t-test" function.
Alternatively, you can enable the "analysis toolpack" add-in
(tools\add-ins) that will gives you the "data analysis" tool. Given two
column filled with your the results of the 10 folds, it produces a
complete analysis. Look for the P(T<=t). It should be lower than 0,05
for you test to be statistically significant at the 95% level, for
instance.

David


On Mon, 10 Jan 2005 19:17:39 +0100, Saunier Nicolas <saunier@enst.fr> wrote:
> Dear weka users,
> 
> I would like to evaluate the statistical significance of some results,
> and the t paired test (T-Test statistics) seems to be implemented in
> weka, but I don't know how to use it in my own code. Could someone
> explain me how to do it, or give me some link/reference on how to
> implement a simple statistical test, for, say, 10 trials of 10 fold
> cross-validation ?
> Thanks for any help,
> 
> Nicolas Saunier
> --
> http://www.infres.enst.fr/~saunier
> 
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> 


-- 
Balie - Baseline Information Extraction
http://balie.sourceforge.net
[Open Source ~ 100% Java ~ Using Weka ~ Multilingual]

From andy_liaw at merck.com  Tue Jan 11 09:51:05 2005
From: andy_liaw at merck.com (Liaw, Andy)
Date: Tue Jan 11 09:51:40 2005
Subject: [Wekalist] T-paired tests
Message-ID: <3A822319EB35174CA3714066D590DCD50994E4F8@usrymx25.merck.com>

> From: David
> 
> Hello,
> 
> I would say the best explanations can be found on this page:
> 
> http://home.clara.net/sisa/instr.htm
> 
> Practically, this is handy to use Microsoft Excel to compute 
> the t-test.
> You can simply use the built-in "t-test" function.
> Alternatively, you can enable the "analysis toolpack" add-in
> (tools\add-ins) that will gives you the "data analysis" tool. 
> Given two
> column filled with your the results of the 10 folds, it produces a
> complete analysis. Look for the P(T<=t). It should be lower than 0,05
> for you test to be statistically significant at the 95% level, for
> instance.
> 
> David

Excuse me for nitpicking, but I guess you meant 5% level?  `Level' of a test
(or alpha) refers to the upper bound on the type I error (false positive
rate; rejecting the null hypothesis when it's true).

I'd try to avoid Excel for even simple Statistics, as it is notoriously bad.
See, e.g., http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html
and links from there.

A paired t-test is just one-sample t-test using the differences of the pairs
as data, testing whether the mean difference is 0.  You don't need fancy
tools for that.

Andy 
 

> On Mon, 10 Jan 2005 19:17:39 +0100, Saunier Nicolas 
> <saunier@enst.fr> wrote:
> > Dear weka users,
> > 
> > I would like to evaluate the statistical significance of 
> some results,
> > and the t paired test (T-Test statistics) seems to be implemented in
> > weka, but I don't know how to use it in my own code. Could someone
> > explain me how to do it, or give me some link/reference on how to
> > implement a simple statistical test, for, say, 10 trials of 10 fold
> > cross-validation ?
> > Thanks for any help,
> > 
> > Nicolas Saunier
> > --
> > http://www.infres.enst.fr/~saunier
> > 
> > _______________________________________________
> > Wekalist mailing list
> > Wekalist@list.scms.waikato.ac.nz
> > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> > 
> 
> 
> -- 
> Balie - Baseline Information Extraction
> http://balie.sourceforge.net
> [Open Source ~ 100% Java ~ Using Weka ~ Multilingual]
> 
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> 
> 


------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message.  If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.
------------------------------------------------------------------------------

From THOMAS.C.JOHNSON at saic.com  Tue Jan 11 10:14:41 2005
From: THOMAS.C.JOHNSON at saic.com (Johnson, Thomas C.)
Date: Tue Jan 11 10:15:02 2005
Subject: [Wekalist] Weka and the Java VM
Message-ID: <0A5C43B71EC8EE4F8393BCADE546841A03379A50@vie-its-exs02.mail.saic.com>

Hello All,

I read in the archives that there's a compatability issue using Weka and JVM
1.4.2_06-b03, and that the solution is to revert to 1.4.2_05.  I've
downloaded and installed j2re-1.4.2_05-fcs.rpm using rpm.  However, when I
run Weka, Java continues to use 1.4.2_06-b03.  How do I tell java to use
version 1.4.2_05 instead of 1.4.2_06?  

This is on Linux, by the way.
--TcJ

From tgd at cs.orst.edu  Tue Jan 11 10:43:50 2005
From: tgd at cs.orst.edu (Thomas G. Dietterich)
Date: Tue Jan 11 10:44:04 2005
Subject: [Wekalist] T-paired tests
References: <E9A26208-6333-11D9-9102-000A95BB8B1A@enst.fr>
	<a39a6670050110115653200c7b@mail.gmail.com>
Message-ID: <722-Mon10Jan2005134350-0800-tgd@cs.orst.edu>

The paired T-test should never be used for comparing learning
algorithms.  It has very high Type I error and is not sound.
I recommend either McNemar's test or the 5x2cv F test developed by
Alpaydin.  It is a pity that neither of these tests is implemented in
WEKA. 

For more details, see

@article{a-cftcscla-99,
author = {E. Alpaydin},
year = 1999,
title = {Combined 5x2cv {F} Test for Comparing Supervised
Classification Learning Algorithms},
journal = {Neural Computation},
volume = 11,
number = 8, 
pages = {1885--1892}}

and

@article{d-astdscla-98,
title = {Approximate Statistical Tests for Comparing
Supervised Classification Learning Algorithms},
author = {Thomas G. Dietterich},
journal = {Neural Computation},
volume = {10},
number = 7,
pages = {1895--1924},
year = {1998}}

The latter is available from my web page.

-- 
Thomas G. Dietterich, Professor   Voice: 541-737-5559
School of Electrical Engineering  FAX:   541-737-3014
  and Computer Science            URL:   http://www.cs.orst.edu/~tgd
Dearborn Hall 102, Oregon State University, Corvallis, OR 97331-3102     
--




From eibe at cs.waikato.ac.nz  Tue Jan 11 11:13:49 2005
From: eibe at cs.waikato.ac.nz (Eibe Frank)
Date: Tue Jan 11 11:13:54 2005
Subject: [Wekalist] Books or Webpages
In-Reply-To: <E1Co7EU-0003FV-BI@ghoul.scms.waikato.ac.nz>
References: <E1Co7EU-0003FV-BI@ghoul.scms.waikato.ac.nz>
Message-ID: <E728117C-6354-11D9-A7E4-000A959DE03E@cs.waikato.ac.nz>

In most cases there should be a reference to a publication describing 
the algorithm in the Javadoc of the corresponding class (i.e. in the 
comments extracted from the source code).

Cheers,
Eibe

On Jan 11, 2005, at 10:37 AM, wekalist-request@list.scms.waikato.ac.nz 
wrote:

> Hello,
>
> I see lots of messages about the technical part or Java part of Weka. 
> Can
> somebody refer some good text books (or web pages) that can guide me on
> understanding the classifier and cluster algorithms built on Weka?
>
> I already own the Datamining book written by Ian H. Witten and Eibe 
> Frank
> (which is a wonderful introduction to machine learning algorithms) but 
> what
> I am searching is another source of information on all the included
> algorithms on Weka.
>
> Thank you in Advance,
> Javier Albarrac?n
> Lima, Per?


From javier at azulcielo.com  Tue Jan 11 16:16:44 2005
From: javier at azulcielo.com (Javier Albarracin)
Date: Tue Jan 11 16:16:33 2005
Subject: [Wekalist] Books or Webpages
In-Reply-To: <20050110233628.844DD41429@mail01.powweb.com>
Message-ID: <000001c4f78b$fb4e3d20$be00a8c0@JAlbarracinx>

Dear Eibe,

Thank you... Already checked the javadocs like the one at:
http://www.cs.waikato.ac.nz/~ml/weka/doc_gui/weka/clusterers/EM.html
expectation maximisation class) but it does not
Include a reference to a publication describing the algorithm... Am I
searching where
You pinpointed me?

Thanks, Javier


From Tim.DeMeyer at Ugent.be  Tue Jan 11 23:51:54 2005
From: Tim.DeMeyer at Ugent.be (Tim De Meyer)
Date: Tue Jan 11 23:51:42 2005
Subject: [Wekalist] Interaction effect
Message-ID: <005101c4f7cb$90aaf040$4b55c19d@MAXDATAF41B9B6>

Skipped content of type multipart/alternative-------------- next part --------------
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.6.10 - Release Date: 10-1-2005
From Paul.Lamere at Sun.COM  Wed Jan 12 03:19:46 2005
From: Paul.Lamere at Sun.COM (Paul Lamere)
Date: Wed Jan 12 03:19:59 2005
Subject: [Wekalist] Re: Problematic Thread
Message-ID: <41E3E082.8000507@sun.com>

Frank:

I was seeing the same problem.  Google points me to this bug report:

http://forum.java.sun.com/thread.jspa?threadID=584976&tstart=0

which indicates that the problem is only with the '-client' version of the compiler. So a good 
work-around if you haven't already found one is to start java with the "-server' switch.

Paul


On Wed, 1 Dec 2004 11:59:38 -0600, Schilder, Frank (TLR Corp)
<frank.schilder at thomson.com <https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist>> wrote:
>/ 
/>/ 
/>/ Hi,
/>/ 
/>/ I tried to use decision trees (j46) for my data, but Weka crashed while
/>/ building the model with the following error message:
/>/ #
/>/ # HotSpot Virtual Machine Error, Internal Error
/>/ # Please report this error at
/>/ # http://java.sun.com/cgi-bin/bugreport.cgi
/>/ #
/>/ # Java VM: Java HotSpot(TM) Client VM (1.4.2_06-b03 mixed mode)
/>/ #
/>/ # Error ID: 43113F2652414D452D41503F491418160E435050005C
/>/ #
/>/ # Problematic Thread: prio=5 tid=0x009bc910 nid=0x7bc runnable
/>/ #
/>/ 
/>/ Heap at VM Abort:
/>/ Heap
/>/ def new generation   total 1216K, used 1074K [0x10010000, 0x10160000,
/>/ 0x10770000)
/>/  eden space 1088K,  90% used [0x10010000, 0x101069a8, 0x10120000)
/>/  from space 128K,  69% used [0x10120000, 0x10136190, 0x10140000)
/>/  to   space 128K,   0% used [0x10140000, 0x10140000, 0x10160000)
/>/ tenured generation   total 15288K, used 11086K [0x10770000, 0x1165e000,
/>/ 0x16010000)
/>/   the space 15288K,  72% used [0x10770000, 0x11243bf8, 0x11243c00,
/>/ 0x1165e000)
/>/ compacting perm gen  total 8704K, used 8636K [0x16010000, 0x16890000,
/>/ 0x1a010000)
/>/   the space 8704K,  99% used [0x16010000, 0x1687f388, 0x1687f400,
/>/ 0x16890000)
/>/ 
/>/ I called weka with  java -mx100000000 -oss100000000 -jar weka.jar
/>/ on Windows XP
/>/ (I use weka-3-4-3jre, but weka also crashed without the Java VM)
/>/ 
/>/ An older version of weka installed on a slower linux machine, however,
/>/ works fine with my data.
/>/ 
/>/ I checked the archive for "problematic thread", but the problem
/>/ described in the 'string attribute' thread there doesn't apply to my
/>/ data, because I don't use any string attributes.
/>/ 
/>/ Any idea what may cause the crash of Weka here?
/>/ 
/>/ Thanks,
/>/ Frank/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050111/d55f3f85/attachment.htm
From Paul.Lamere at Sun.COM  Wed Jan 12 03:53:47 2005
From: Paul.Lamere at Sun.COM (Paul Lamere)
Date: Wed Jan 12 03:54:00 2005
Subject: [Wekalist] Re: Problematic Thread
Message-ID: <41E3E87B.4040400@sun.com>

Frank:

I saw your post from last month, I don't know if you found a resolution. I was seeing the same problem. 
Google pointed me to this bug report:

http://forum.java.sun.com/thread.jspa?threadID=584976&tstart=0

which indicates that the problem is only with the '-client' version of the compiler. So a good 
work-around if you haven't already found one is to start java with the "-server' switch like so:

java -mx400m -server -jar weka.jar

Hope this helps

Paul


On Wed, 1 Dec 2004 11:59:38 -0600, Schilder, Frank (TLR Corp)
<frank.schilder at thomson.com <https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist>> wrote:
>/ 
/>/ 
/>/ Hi,
/>/ 
/>/ I tried to use decision trees (j46) for my data, but Weka crashed while
/>/ building the model with the following error message:
/>/ #
/>/ # HotSpot Virtual Machine Error, Internal Error
/>/ # Please report this error at
/>/ # http://java.sun.com/cgi-bin/bugreport.cgi
/>/ #
/>/ # Java VM: Java HotSpot(TM) Client VM (1.4.2_06-b03 mixed mode)
/>/ #
/>/ # Error ID: 43113F2652414D452D41503F491418160E435050005C
/>/ #
/>/ # Problematic Thread: prio=5 tid=0x009bc910 nid=0x7bc runnable
/>/ #
/>/ 
/>/ Heap at VM Abort:
/>/ Heap
/>/ def new generation   total 1216K, used 1074K [0x10010000, 0x10160000,
/>/ 0x10770000)
/>/  eden space 1088K,  90% used [0x10010000, 0x101069a8, 0x10120000)
/>/  from space 128K,  69% used [0x10120000, 0x10136190, 0x10140000)
/>/  to   space 128K,   0% used [0x10140000, 0x10140000, 0x10160000)
/>/ tenured generation   total 15288K, used 11086K [0x10770000, 0x1165e000,
/>/ 0x16010000)
/>/   the space 15288K,  72% used [0x10770000, 0x11243bf8, 0x11243c00,
/>/ 0x1165e000)
/>/ compacting perm gen  total 8704K, used 8636K [0x16010000, 0x16890000,
/>/ 0x1a010000)
/>/   the space 8704K,  99% used [0x16010000, 0x1687f388, 0x1687f400,
/>/ 0x16890000)
/>/ 
/>/ I called weka with  java -mx100000000 -oss100000000 -jar weka.jar
/>/ on Windows XP
/>/ (I use weka-3-4-3jre, but weka also crashed without the Java VM)
/>/ 
/>/ An older version of weka installed on a slower linux machine, however,
/>/ works fine with my data.
/>/ 
/>/ I checked the archive for "problematic thread", but the problem
/>/ described in the 'string attribute' thread there doesn't apply to my
/>/ data, because I don't use any string attributes.
/>/ 
/>/ Any idea what may cause the crash of Weka here?
/>/ 
/>/ Thanks,
/>/ Frank/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050111/c476fc56/attachment.htm
From gerhard at marini.at  Wed Jan 12 04:00:45 2005
From: gerhard at marini.at (Gerhard Marini)
Date: Wed Jan 12 04:00:54 2005
Subject: [Wekalist] Choosing a subset of values for a attribute
Message-ID: <opskf1bjii795nxf@smtp.1und1.com>

Dear all,

I have a nominal attribute consisting of 6 values (g1, g2, ..., g6). I  
would like to classify according to this nominal attribute, but not using  
all 6 values. Rather I would prefer e.g. a pairwise classification  
according to g1/g2, g1/g3, etc. Is there a way to this different than to  
edit the arff file and delete all rows that contain e.g. the values  
g3,..., g6 when I want to classify according to g1/g2?

Thanks for your help,

Gerhard

From fracpete at waikato.ac.nz  Wed Jan 12 09:08:34 2005
From: fracpete at waikato.ac.nz (Peter Reutemann)
Date: Wed Jan 12 09:08:39 2005
Subject: [Wekalist] Choosing a subset of values for a attribute
In-Reply-To: <opskf1bjii795nxf@smtp.1und1.com>
References: <opskf1bjii795nxf@smtp.1und1.com>
Message-ID: <41E43242.8030904@waikato.ac.nz>

Hey!

You could use the "RemoveWithValues"-Filter in combination with the 
"FilteredClassifier" instead of editing the ARFF file manually.
The filter removes all instances with a certain value, i.e. the value 
you specified the index for. If the values g1 to g6 are ordered and you 
want to remove g3-g6 you basically remove the indices 3-6 of the 
attribute. You can do this with the following command (if the attribute 
is the last one, you can drop the -C/-c options, otherwise replace 
"<class-index>" with the attribute index):

   java -classpath ...
   weka.classifiers.meta.FilteredClassifier
   -W "weka.classifiers.trees.J48"
   -F "weka.filters.unsupervised.instance.RemoveWithValues -L 3-6 -C
       <class-index>"
   -t yourfile.arff
   -c <class-index>

Hope that helps!

Cheers, Peter

Gerhard Marini wrote:

> Dear all,
> 
> I have a nominal attribute consisting of 6 values (g1, g2, ..., g6). I  
> would like to classify according to this nominal attribute, but not 
> using  all 6 values. Rather I would prefer e.g. a pairwise 
> classification  according to g1/g2, g1/g3, etc. Is there a way to this 
> different than to  edit the arff file and delete all rows that contain 
> e.g. the values  g3,..., g6 when I want to classify according to g1/g2?
> 
> Thanks for your help,
> 
> Gerhard
> 
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> 
> 

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato
Phone +64 (7) 838-4466 Ext. 8766

From fracpete at waikato.ac.nz  Wed Jan 12 11:40:56 2005
From: fracpete at waikato.ac.nz (Peter Reutemann)
Date: Wed Jan 12 11:41:03 2005
Subject: [Wekalist] (un)expected behaviour of Experimenter?
In-Reply-To: <41E2593C.9030106@ugent.be>
References: <41E2593C.9030106@ugent.be>
Message-ID: <41E455F8.40301@waikato.ac.nz>

Hey!

Just a couple of questions, since I can't produce the error myself 
(neither Linux nor Win32):
- What version of WEKA do you use (I'm using a more or less recent
   CVS snapshot)?
- What exception is printed in the console (or better the whole
   printout in the console when the experiment is loaded)?

BTW
If you get a current CVS snapshot of WEKA you can save your experiments 
in XML rather than the binary format, which makes them usable even 
across different versions of WEKA (binary experiments cannot be loaded 
anymore if e.g. a classifier was changed from one version to another - 
the hassle with the serialUID).

Cheers, Peter

Stijn Lievens wrote:

> Hi Weka users and implementers,
> 
> Just now, I noticed the following -- in my opinion -- strange behaviour 
> of the experimenter.
> 
> I did the following (I'm working with Linux)
> 
> cd $HOME
> java weka.gui.experiment.Experimenter
> <configure advanced experiment>
> save experiment under $HOME/other_dir
> 
> I thus created and saved and experiment in an *.exp file.
> 
> Now, when I tried
> 
> cd $HOME/other_dir
> java weka.gui.experiment.Experimenter
> <try to open an advanced experiment>
> 
> this failed, and the experiment could not be opened.
> 
> When I tried
> 
> cd $HOME
> java weka.gui.experiment.Experimenter
> <try to open an advanced experiment>
> 
> then everything worked out nicely.
> 
> 
> In short, it looks like if one wants to reuse and existing experiment, 
> one has to start the Experimenter from exactly the same location,
> which looks like a serious limitation to me.  Also, (I have not yet 
> tested this) will this not limit the possibility of configuring an 
> experiment of one machine, and executing it (after copying) on another 
> machine.
> 
> Kind regards,
> 
> Stijn Lievens.
> 
> 

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato
Phone +64 (7) 838-4466 Ext. 8766

From abendav at netvision.net.il  Wed Jan 12 11:57:54 2005
From: abendav at netvision.net.il (Dr. Arie Ben David)
Date: Wed Jan 12 11:56:57 2005
Subject: [Wekalist] Teaching Materials on the Web
Message-ID: <000601c4f830$ffbafe20$5a9f003e@asus>

Hi everyone
Can anyone please update me where I can find WEKA realted teaching material. The site mentioned in the book does not seem to exist any more due to a merger and I cannot find them in Elsevier book web site.
Thanks
Arie Ben David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050112/a4b88f03/attachment.htm
From eibe at cs.waikato.ac.nz  Wed Jan 12 14:30:29 2005
From: eibe at cs.waikato.ac.nz (Eibe Frank)
Date: Wed Jan 12 14:30:32 2005
Subject: [Wekalist] Teaching Materials on the Web
In-Reply-To: <E1CoVgB-0004fH-Nw@ghoul.scms.waikato.ac.nz>
References: <E1CoVgB-0004fH-Nw@ghoul.scms.waikato.ac.nz>
Message-ID: <8B108CD4-6439-11D9-A7E4-000A959DE03E@cs.waikato.ac.nz>

The link on our book page should work 
(http://www.cs.waikato.ac.nz/~ml/weka/book.html). Let me know if it 
doesn't.

Cheers,
Eibe

> Hi everyone
> Can anyone please update me where I can find?WEKA realted teaching 
> material. The site?mentioned in the book does not seem to exist 
> any?more due to a merger?and I cannot find them?in?Elsevier book web 
> site.
> Thanks
> Arie Ben David
> ??
>


From stijn.lievens at ugent.be  Wed Jan 12 22:37:24 2005
From: stijn.lievens at ugent.be (Stijn Lievens)
Date: Wed Jan 12 22:37:41 2005
Subject: [Wekalist] (un)expected behaviour of Experimenter?
In-Reply-To: <41E455F8.40301@waikato.ac.nz>
References: <41E2593C.9030106@ugent.be> <41E455F8.40301@waikato.ac.nz>
Message-ID: <41E4EFD4.7000203@ugent.be>

Peter Reutemann wrote:
> Hey!
> 
> Just a couple of questions, since I can't produce the error myself 
> (neither Linux nor Win32):
> - What version of WEKA do you use (I'm using a more or less recent
>   CVS snapshot)?
> - What exception is printed in the console (or better the whole
>   printout in the console when the experiment is loaded)?
> 

Hi Peter,

I'm sorry.  In trying to reproduce the error I sorted out what was wrong 
myself (and in fact it has something to do with your next remark)

The thing is, I have two versions of weka installed on my computer.
One in /usr/local and one in my home directory.

In my "classpath" the current directory is the first one.
So what happened was in fact the following:
I created the experiment from my home directory thus using the 
weka-version accessible from there (because it comes first in the 
classpath).  When I tried to open the experiment (from another 
directory) the weka version installed in /usr/local was used (and 
apparently these two have a different version of weka.core.FastVector 
... ) (This was thus the first class he encountered where there was a 
problem.  There will probably be many other classes which have changed)

> BTW
> If you get a current CVS snapshot of WEKA you can save your experiments 
> in XML rather than the binary format, which makes them usable even 
> across different versions of WEKA (binary experiments cannot be loaded 
> anymore if e.g. a classifier was changed from one version to another - 
> the hassle with the serialUID).
> 

this is a very good tip, which I will certainly try!

Cheers,

Stijn.

> Cheers, Peter
> 
> Stijn Lievens wrote:
> 
>> Hi Weka users and implementers,
>>
>> Just now, I noticed the following -- in my opinion -- strange 
>> behaviour of the experimenter.
>>
>> I did the following (I'm working with Linux)
>>
>> cd $HOME
>> java weka.gui.experiment.Experimenter
>> <configure advanced experiment>
>> save experiment under $HOME/other_dir
>>
>> I thus created and saved and experiment in an *.exp file.
>>
>> Now, when I tried
>>
>> cd $HOME/other_dir
>> java weka.gui.experiment.Experimenter
>> <try to open an advanced experiment>
>>
>> this failed, and the experiment could not be opened.
>>
>> When I tried
>>
>> cd $HOME
>> java weka.gui.experiment.Experimenter
>> <try to open an advanced experiment>
>>
>> then everything worked out nicely.
>>
>>
>> In short, it looks like if one wants to reuse and existing experiment, 
>> one has to start the Experimenter from exactly the same location,
>> which looks like a serious limitation to me.  Also, (I have not yet 
>> tested this) will this not limit the possibility of configuring an 
>> experiment of one machine, and executing it (after copying) on another 
>> machine.
>>
>> Kind regards,
>>
>> Stijn Lievens.
>>
>>
> 


-- 
==========================================================================
Dept. of Applied Mathematics and Computer Science, University of Ghent
Krijgslaan 281 - S9, B - 9000 Ghent, Belgium
Phone: +32-9-264.48.91, Fax: +32-9-264.49.95
E-mail: Stijn.Lievens@ugent.be, URL: http://allserv.ugent.be/~slievens/
==========================================================================

From jmgomez at uem.es  Wed Jan 12 22:50:51 2005
From: jmgomez at uem.es (Jose Maria Gomez Hidalgo)
Date: Wed Jan 12 22:51:47 2005
Subject: [Wekalist] Teaching Materials on the Web
In-Reply-To: <000601c4f830$ffbafe20$5a9f003e@asus>
References: <000601c4f830$ffbafe20$5a9f003e@asus>
Message-ID: <6.0.3.0.2.20050112104643.02f08308@correo.uem.es>

At 23:57 11/01/2005, Dr. Arie Ben David wrote:
>Hi everyone
>Can anyone please update me where I can find WEKA realted teaching 
>material. The site mentioned in the book does not seem to exist any more 
>due to a merger and I cannot find them in Elsevier book web site.

I have not problems accessing the weka book page 
(http://www.cs.waikato.ac.nz/~ml/weka/book.html), nor the Morgan Kaufmann 
teaching page linked in weka's one. If you can't access it, I can pack the 
stuff and sent it to you.

>Thanks
>Arie Ben David
>
>_______________________________________________
>Wekalist mailing list
>Wekalist@list.scms.waikato.ac.nz
>https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

Jose Maria Gomez Hidalgo
Departamento de Sistemas Inform?ticos
Universidad Europea de Madrid
28670 - Villaviciosa de Odon - MADRID
(+34) 912115670
jmgomez@uem.es
http://www.esi.uem.es/~jmgomez/

La legislaci?n espa?ola ampara el secreto de las comunicaciones. Este 
correo electr?nico es estrictamente confidencial y va dirigido 
exclusivamente a su destinatario/a. Si no es Ud., le rogamos que no difunda 
ni copie la transmisi?n y nos lo notifique cuanto antes.

Spanish law guarantees privacy in electronic communications. This 
electronic transmission is strictly confidential and intended solely for 
the addressee. If you are not the intended addressee, you are kindly 
requested not to disclose nor to copy this transmission and to notify us as 
soon as possible.


From amarnath at gmx.de  Thu Jan 13 04:37:11 2005
From: amarnath at gmx.de (Amarnath Anumandla)
Date: Thu Jan 13 04:37:19 2005
Subject: [Wekalist] Help on Model Trees
Message-ID: <26984.1105544231@www73.gmx.net>

Hi

Can anyone tell me, where can i find more information on Model Trees (i.e
M5P method at trees in Weka tool) than in the Data Mining book.

Thanks in advance

Best regards,
Amarnath

-- 
+++ Sparen Sie mit GMX DSL +++ http://www.gmx.net/de/go/dsl
AKTION für Wechsler: DSL-Tarife ab 3,99 EUR/Monat + Startguthaben

From sankut10 at aut.ac.nz  Thu Jan 13 10:01:35 2005
From: sankut10 at aut.ac.nz (Sangeetha Kutty)
Date: Thu Jan 13 10:01:55 2005
Subject: [Wekalist] Re: Wekalist Digest, Vol 23, Issue 12
In-Reply-To: <200501112354.j0BNsMfu025779@horuhoru.aut.ac.nz>
References: <200501112354.j0BNsMfu025779@horuhoru.aut.ac.nz>
Message-ID: <1105563695.41e5902f54bc2@webmail.aut.ac.nz>

Hello Arie,
If you meant the explorer guide of Weka ( how to use the explorer in Weka) then
you can check this out at:

http://homepage.cs.uri.edu/faculty/hamel/courses/spring2004/csc492/WekaExplorerGuide.pdf

Though there are other websites, i used this one as the content remains the
same.

Hope this helps,
Reg
Sangeetha
Quoting wekalist-request@list.scms.waikato.ac.nz:

> Send Wekalist mailing list submissions to
> 	wekalist@list.scms.waikato.ac.nz
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> or, via email, send a message with subject or body 'help' to
> 	wekalist-request@list.scms.waikato.ac.nz
>
> You can reach the person managing the list at
> 	wekalist-owner@list.scms.waikato.ac.nz
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wekalist digest..."
>
>
> Today's Topics:
>
>    1. Teaching Materials on the Web (Dr. Arie Ben David)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 12 Jan 2005 00:57:54 +0200
> From: "Dr. Arie Ben David" <abendav@netvision.net.il>
> Subject: [Wekalist] Teaching Materials on the Web
> To: wekalist@list.scms.waikato.ac.nz
> Message-ID: <000601c4f830$ffbafe20$5a9f003e@asus>
> Content-Type: text/plain; charset="windows-1255"
>
> Hi everyone
> Can anyone please update me where I can find WEKA realted teaching material.
> The site mentioned in the book does not seem to exist any more due to a
> merger and I cannot find them in Elsevier book web site.
> Thanks
> Arie Ben David
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
>
https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050112/a4b88f03/attachment-0001.htm
>
> ------------------------------
>
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>
>
> End of Wekalist Digest, Vol 23, Issue 12
> ****************************************
>



From wlzbd at hotmail.com  Thu Jan 13 13:09:21 2005
From: wlzbd at hotmail.com (w lizeng)
Date: Thu Jan 13 13:10:09 2005
Subject: [Wekalist] How to perfect an initialization bayesian network?
Message-ID: <BAY24-F37E491ACF638B35DC057D1A88A0@phx.gbl>

An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050113/8b764acb/attachment.htm
From bthom at cs.hmc.edu  Thu Jan 13 13:57:08 2005
From: bthom at cs.hmc.edu (belinda thom)
Date: Thu Jan 13 13:56:25 2005
Subject: [Wekalist] terminology questions: functional approximation /
	classification
Message-ID: <0CC13F12-64FE-11D9-8417-000D93ACC694@cs.hmc.edu>

Hi there WEKA-ans,

First, thanks for such a great toolbox! Will likely be using this in a 
machine learning course I'm running for the first time this Spring ... 
am very grateful and impressed by the quality and effort that's been 
put into this.

I am a bit confused about what "classifier" means in WEKA and was 
hoping to solicit some history, opinions, etc.

When I use the term classifier, I usually refer to classification as 
mapping from a feature vector x into some class y, which is an 
enumerated list of discrete, nominal things.  In contrast, when 
learning concept y = f(x) where y is real and x is some vector of, say, 
reals, I'd call this function approximation.

Now WEKA has weka.classifiers.functions (which include things like 
linear regression and min weighted squares) but its also got piles of 
things that do (only?) discrete, nominal classification: SVM, bayes 
classifiers, etc.

In your documentation, does classifier rather refer to the fact that 
learning is supervised? It appears to us from our brief series of tests 
that you can specify function approximation by specifying in the ARFF 
file that the class label is "real". Similarly, learning rankings could 
be done by saying the class label is "integer". Nominal classification 
tasks would be "nominal" (or whatever the ARFF syntax for that is).

Also, it seems to us that the functions part of WEKA is a bit less 
stable (for instance, prior sorce forge is needed for some of the 
regression to work properly). Does this mean that there is much less 
interest in function approximation in WEKA than on nominal 
classification?

Thanks a bunch!
--b


From hien at pmail.ntu.edu.sg  Thu Jan 13 15:22:06 2005
From: hien at pmail.ntu.edu.sg (#NGUYEN VAN HIEN#)
Date: Thu Jan 13 15:22:14 2005
Subject: [Wekalist] Problem when using Weka Explorerto open large datasets
Message-ID: <E030192F65406648905385147031729E49A920@mail03.student.main.ntu.edu.sg>

Hi all,
If I use Weka Explorer to open the continuous breastCancer datset (24481
attributes, 97 instances) by a 512M computer, if fails with an error
message that is like "out of memory". My weka has not been done any
reconfiguration. I would like to know how to solve the problem. 
 
Thanks
Nguyen Van Hien
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050113/921e2083/attachment.htm
From fracpete at waikato.ac.nz  Thu Jan 13 16:48:10 2005
From: fracpete at waikato.ac.nz (Peter Reutemann)
Date: Thu Jan 13 16:48:17 2005
Subject: [Wekalist] Problem when using Weka Explorerto open large datasets
In-Reply-To: <E030192F65406648905385147031729E49A920@mail03.student.main.ntu.edu.sg>
References: <E030192F65406648905385147031729E49A920@mail03.student.main.ntu.edu.sg>
Message-ID: <41E5EF7A.4050500@waikato.ac.nz>

Hey!

With large datasets it can happen that the default heap size of the 
virtual machine is not enough. You can increase the maximum heap size 
with the "-Xmx" parameter. You can e.g. start the Explorer with 256MB as 
heap size:
    java -Xmx256m -classpath weka.jar weka.gui.explorer.Explorer

Cheers, Peter

#NGUYEN VAN HIEN# wrote:
> Hi all,
> 
> If I use Weka Explorer to open the continuous breastCancer datset (24481 
> attributes, 97 instances) by a 512M computer, if fails with an error 
> message that is like ?out of memory?. My weka has not been done any 
> reconfiguration. I would like to know how to solve the problem.
> 
>  
> 
> Thanks
> 
> Nguyen Van Hien
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato
Phone +64 (7) 838-4466 Ext. 8766

From luciana.bucene at agr.unicamp.br  Fri Jan 14 02:23:03 2005
From: luciana.bucene at agr.unicamp.br (Luciana Corpas Bucene)
Date: Fri Jan 14 02:23:12 2005
Subject: [Wekalist] weka Brazil
Message-ID: <1534.200.0.70.148.1105622583.squirrel@new.host.name>

Hi everybody,

They must my difficulty in ingles, I would like to know if there are
persons working with the Weka in Brazil.  This would facilitate the
exchange of experiencia and difficulties found during the work.
Thank you

Luci.





From czhang at fzi.de  Fri Jan 14 02:31:56 2005
From: czhang at fzi.de (Changgong Zhang)
Date: Fri Jan 14 02:32:04 2005
Subject: [Wekalist] how to deal with weka output
Message-ID: <001501c4f974$4022edd0$8307158d@fzi.de>

Hallo,

When I use weka from my java program and perform some task ( like Apriori )
and get output in Eclips console,  but I want to save the output in a
java-list. so I want to know, how can my java program read and use further
the output.


From eibe at cs.waikato.ac.nz  Fri Jan 14 10:06:56 2005
From: eibe at cs.waikato.ac.nz (Eibe Frank)
Date: Fri Jan 14 10:06:55 2005
Subject: [Wekalist] Important: buggy JVMs and Weka (in particular, SMO)
Message-ID: <0EA12D50-65A7-11D9-A7E4-000A959DE03E@cs.waikato.ac.nz>

When you are using SMO in Weka, or any other technique that relies on 
numerical computations (i.e. almost everything), please make sure that 
you are using a Java Virtual Machine that gets its computations right. 
We have had problems with:

(a) the IBM Java Virtual Machines (but not the Jikes RVM, which we 
haven't tried yet), and
(b) some Sun Java Virtual Machines run in server mode (i.e. using the 
-server flag)

We are now using the latest Sun Java 1.5 Virtual Machine (1.5.0), which 
appears to work fine in both normal and server mode.

Note that we have never observed a problem with Sun JVMs that were not 
run in server mode. Also, we have stopped using the IBM JVM a while 
ago, so we don't know whether the problems have been fixed in recent 
versions.

So far we haven't observed any problems with Apple's JVM or Microsoft's 
JVM (however, we don't use the latter one very much).

A good indication that something is not right is when you are getting 
different results when you repeatedly run a learning scheme. This 
should not happen as long as you do not change the random number seed 
used for shuffling the data. Often the accuracy is also 
catastrophically bad.

It appears that some JVMs are a bit too enthusiastic about performing 
JIT optimizations.

Cheers,
Eibe

PS: Recently, there has been a report on a discrepancy between the 
results obtained with Weka's SMO and LIBSVM. This appears to have been 
due to a buggy IBM JVM.

PPS: If you run SMO on datasets with more than 1000 instances, you 
might want to increase the size of the kernel cache (which is set to 
~1,000,000 by default) to speed things up. Ideally it should be larger 
than the square of the number of training instances.


From eibe at cs.waikato.ac.nz  Fri Jan 14 14:19:46 2005
From: eibe at cs.waikato.ac.nz (Eibe Frank)
Date: Fri Jan 14 14:19:46 2005
Subject: [Wekalist] terminology questions: functional approximation /
	classification
In-Reply-To: <E1CpETH-0002gE-A8@ghoul.scms.waikato.ac.nz>
References: <E1CpETH-0002gE-A8@ghoul.scms.waikato.ac.nz>
Message-ID: <6100CC67-65CA-11D9-8F91-000A959DE03E@cs.waikato.ac.nz>

Yes, the notion of classifiers in Weka is confusing. Some of them can 
predict numeric targets, others nominal ones, and some both types.

Note that this is not restricted to weka.classifiers.functions. When we 
decided to divide the classifiers in Weka into sub packages we had to 
decide on names, and we chose the name "functions" to refer to the 
models produced by the algorithms in that package because people 
generally think of them as mathematical functions. We weren't really 
all that happy with the name but couldn't think of anything better.

Note that Weka doesn't distinguish between real or integer-valued 
attributes. They are both "numeric".

I'm not quite sure what you mean by "prior sorce forge is needed for 
some of the regression to work properly". All methods in Weka are 
supposed to be equally stable. If something doesn't work properly we 
would like to hear about it, and we will endeavor to fix it, no matter 
what algorithm in Weka has the problem. However, it is true that some 
bug fixes are only in the CVS repository because we haven't had a 
chance to make a new release.

Cheers,
Eibe


On Jan 14, 2005, at 12:33 PM, wekalist-request@list.scms.waikato.ac.nz 
wrote:

> I am a bit confused about what "classifier" means in WEKA and was 
> hoping to solicit some history, opinions, etc.
>
> When I use the term classifier, I usually refer to classification as 
> mapping from a feature vector x into some class y, which is an 
> enumerated list of discrete, nominal things.  In contrast, when 
> learning concept y = f(x) where y is real and x is some vector of, 
> say, reals, I'd call this function approximation.
>
> Now WEKA has weka.classifiers.functions (which include things like 
> linear regression and min weighted squares) but its also got piles of 
> things that do (only?) discrete, nominal classification: SVM, bayes 
> classifiers, etc.
>
> In your documentation, does classifier rather refer to the fact that 
> learning is supervised? It appears to us from our brief series of 
> tests that you can specify function approximation by specifying in the 
> ARFF file that the class label is "real". Similarly, learning rankings 
> could be done by saying the class label is "integer". Nominal 
> classification tasks would be "nominal" (or whatever the ARFF syntax 
> for that is).
>
> Also, it seems to us that the functions part of WEKA is a bit less 
> stable (for instance, prior sorce forge is needed for some of the 
> regression to work properly). Does this mean that there is much less 
> interest in function approximation in WEKA than on nominal 
> classification?


From daveho at cs.umd.edu  Fri Jan 14 14:45:22 2005
From: daveho at cs.umd.edu (David Hovemeyer)
Date: Fri Jan 14 14:45:30 2005
Subject: [Wekalist] Ignoring some attributes during learning?
Message-ID: <20050114014521.GB13810@cs.umd.edu>

Howdy all,

Apologies if this is a FAQ.

I am using Weka to classify data sets in my research.  Currently,
I am experimenting with various ways of encoding my data
as ARFF tuples.  In order to map the classification predictions made by
Weka back to my original data, I would like to encode an "id" attribute in
the ARFF files used for the training and test runs.  However, I obviously
don't want this attribute to be used in the learning process.

Is there an easy way to do this?

If it makes any difference, I'm calling Weka from my own Java code,
rather than the command line or GUI interfaces.

Thanks,
Dave

From bthom at cs.hmc.edu  Fri Jan 14 17:54:39 2005
From: bthom at cs.hmc.edu (belinda thom)
Date: Fri Jan 14 17:53:54 2005
Subject: [Wekalist] Ignoring some attributes during learning?
In-Reply-To: <20050114014521.GB13810@cs.umd.edu>
References: <20050114014521.GB13810@cs.umd.edu>
Message-ID: <65652C47-65E8-11D9-8417-000D93ACC694@cs.hmc.edu>

this is a very interesting question. we too were wondering what  
facilities were in place in weka to map back from something learned to  
the domain you're actually interested in. for instance, suppose i ran  
k-means to cluster the pixels in an image. i'd convert the RGB into 3  
attribute features and do unsupervised learning. it would map to  
classes. those classes would have RGB template values. to what extent  
can weka help with mapping back into a new image file?

we are very new to Weka, of course. i too apologize if this is in a FAQ.

--b

On Jan 13, 2005, at 5:45 PM, David Hovemeyer wrote:

> Howdy all,
>
> Apologies if this is a FAQ.
>
> I am using Weka to classify data sets in my research.  Currently,
> I am experimenting with various ways of encoding my data
> as ARFF tuples.  In order to map the classification predictions made by
> Weka back to my original data, I would like to encode an "id"  
> attribute in
> the ARFF files used for the training and test runs.  However, I  
> obviously
> don't want this attribute to be used in the learning process.
>
> Is there an easy way to do this?
>
> If it makes any difference, I'm calling Weka from my own Java code,
> rather than the command line or GUI interfaces.
>
> Thanks,
> Dave
>
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>
Dr. Belinda Thom
------------------------------------------------------------------------ 
---------------------------
http://www.cs.hmc.edu/~bthom                                             
        909-607-9662
Asst. Professor,  Computer Science                                       
      fax  607-8364
Harvey Mudd College                                                      
              1241 Olin Hall
1250 Dartmouth Ave,  Claremont,  CA,  91711                   physical  
address
301 E. 12th Street,  Claremont,  CA,  91711,  USA              mailing  
address


From hicheehau at gmail.com  Sat Jan 15 01:39:18 2005
From: hicheehau at gmail.com (Chee Hau)
Date: Sat Jan 15 01:39:35 2005
Subject: [Wekalist] Implementation of Jumpting Emerging Pattern (JEP)
	classifier using Weka
Message-ID: <000c01c4fa36$144b6580$b47874cb@SHOT>

Skipped content of type multipart/alternative-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1588 bytes
Desc: not available
Url : https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050114/64cad5b2/attachment.gif
From saunier at enst.fr  Sat Jan 15 04:59:47 2005
From: saunier at enst.fr (Saunier Nicolas)
Date: Sat Jan 15 05:05:10 2005
Subject: [Wekalist] T-paired tests
In-Reply-To: <722-Mon10Jan2005134350-0800-tgd@cs.orst.edu>
References: <E9A26208-6333-11D9-9102-000A95BB8B1A@enst.fr>	<a39a6670050110115653200c7b@mail.gmail.com>
	<722-Mon10Jan2005134350-0800-tgd@cs.orst.edu>
Message-ID: <41E7EC73.8020404@enst.fr>

Thanks for all the information.

In fact, I have also been working on machine learning in data streams. 
In the typical setting, I compare algorithms which are inialized 
randomly on a first dataset, learn on the same data stream and are 
tested on the same set. What kind of test should I use in such a setting 
? Tests based on cross-validation cannot be used in such a setting.
Best regards,

Nicolas Saunier

Thomas G. Dietterich wrote:

>The paired T-test should never be used for comparing learning
>algorithms.  It has very high Type I error and is not sound.
>I recommend either McNemar's test or the 5x2cv F test developed by
>Alpaydin.  It is a pity that neither of these tests is implemented in
>WEKA. 
>
>For more details, see
>
>@article{a-cftcscla-99,
>author = {E. Alpaydin},
>year = 1999,
>title = {Combined 5x2cv {F} Test for Comparing Supervised
>Classification Learning Algorithms},
>journal = {Neural Computation},
>volume = 11,
>number = 8, 
>pages = {1885--1892}}
>
>and
>
>@article{d-astdscla-98,
>title = {Approximate Statistical Tests for Comparing
>Supervised Classification Learning Algorithms},
>author = {Thomas G. Dietterich},
>journal = {Neural Computation},
>volume = {10},
>number = 7,
>pages = {1895--1924},
>year = {1998}}
>
>The latter is available from my web page.
>
>  
>

From W.Shi at cs.bham.ac.uk  Sat Jan 15 05:35:15 2005
From: W.Shi at cs.bham.ac.uk (Wenqi Shi)
Date: Sat Jan 15 05:35:21 2005
Subject: [Wekalist] Disease Classification Problem
Message-ID: <Pine.GSO.4.56.0501141622030.6052@preston.cs.bham.ac.uk>


Hello all,

I am writing to find out whether WEKA has the capabilities to deal
with this kind of dataset. The dataset has more than 700 records of
patients, each of which includes 556 symptoms attributes and 221
disease attributes.  The dataset is a bit sparse, with a mean=7 +/- 4 of
diagnoses per record (7 is the mean and 4 is the standard deviation), a
mean=50 +/- 20 of relevant symptoms per record.

Thus, given a new patient, we need not only to know whether the
patient got each of these 221 disease (they may have many), but also
to know what the likely correct rate is in these predictions (i.e. we
predict 6 diseases but probably four of these are true positive...)

How could the classifiers in WEKA do this kind of multiple prediction?

Advice greatly appreciated.

Wenqi Shi
=============================================
Room 144,
School of Computer Science,
University of Birmingham,
Edgbaston
BIRMINGHAM,
B15 2TT.
United Kingdom


From mpechen at cc.jyu.fi  Sun Jan 16 07:47:30 2005
From: mpechen at cc.jyu.fi (Mykola Pechenizkiy)
Date: Sun Jan 16 07:47:17 2005
Subject: [Wekalist] where is KDTree?
In-Reply-To: <200501102346.j0ANjwG8024950@posti3.jyu.fi>
Message-ID: <200501151847.j0FIl6hA010901@posti6.jyu.fi>

Hello,

There used to be KDTree (in weka.core.KDTree if I remember correctly)
indexing in one of the previous releases of WEKA.
Does anyone know if it has been moved somewhere or removed?

Thanks,
Mykola

--
Mykola Pechenizkiy, Ph. Lic.
Department of Computer Science and Information Systems 
University of Jyv?skyl? 
P.O. Box 35
40351 Jyv?skyl?
Finland

mpechen@cs.jyu.fi
www.cs.jyu.fi/~mpechen



From tbassani at ppgia.pucpr.br  Sun Jan 16 14:29:47 2005
From: tbassani at ppgia.pucpr.br (tbassani@ppgia.pucpr.br)
Date: Sun Jan 16 16:49:24 2005
Subject: [Wekalist] Principal Components Analysis algorithm 
Message-ID: <1105838986.41e9c38b007b6@wwws.ppgia.pucpr.br>

Hallo,

I am working with the PCA algorithm and the WEKA support only a simple version,
to select attributes. However, I need: 
-The matrix composed by the Principal Components Analysis,
-The Component Scores,
-The Component Variances. 

Moreover, I need the Rotation of factor analysis or principal components
analysis loadings. 

There is plug-in that could give me some of these informations? 

Any Information could be helpful.

Thiago Bassani.


From eibe at cs.waikato.ac.nz  Mon Jan 17 08:53:13 2005
From: eibe at cs.waikato.ac.nz (Eibe Frank)
Date: Mon Jan 17 08:53:30 2005
Subject: [Wekalist] Disease Classification Problem
In-Reply-To: <E1CpbCR-0008WG-L4@ghoul.scms.waikato.ac.nz>
References: <E1CpbCR-0008WG-L4@ghoul.scms.waikato.ac.nz>
Message-ID: <41C520F2-67F8-11D9-A5A9-000A959DE03E@cs.waikato.ac.nz>

This sounds like a very interesting dataset. Yes, Weka should be able 
to cope with it, but you need to split the problem into 221 different 
classification problems. (Weka currently can't do vector-valued 
classification.)

Most classifiers in Weka will give you an estimated probability for 
each of the possible classifications. I think this might provide you 
with the information you are after. (However, it doesn't give you 
confidence intervals for its probability estimates, so you won't know 
how sure it is about their exact value. This information would 
definitely improve their usefulness, in particular, in a medical 
application.)

Cheers,
Eibe

On Jan 15, 2005, at 12:49 PM, wekalist-request@list.scms.waikato.ac.nz 
wrote:

> Hello all,
>
> I am writing to find out whether WEKA has the capabilities to deal
> with this kind of dataset. The dataset has more than 700 records of
> patients, each of which includes 556 symptoms attributes and 221
> disease attributes.  The dataset is a bit sparse, with a mean=7 +/- 4 
> of
> diagnoses per record (7 is the mean and 4 is the standard deviation), a
> mean=50 +/- 20 of relevant symptoms per record.
>
> Thus, given a new patient, we need not only to know whether the
> patient got each of these 221 disease (they may have many), but also
> to know what the likely correct rate is in these predictions (i.e. we
> predict 6 diseases but probably four of these are true positive...)
>
> How could the classifiers in WEKA do this kind of multiple prediction?
>
> Advice greatly appreciated.
>
> Wenqi Shi
> =============================================
> Room 144,
> School of Computer Science,
> University of Birmingham,
> Edgbaston
> BIRMINGHAM,
> B15 2TT.
> United Kingdom


From eibe at cs.waikato.ac.nz  Mon Jan 17 08:56:15 2005
From: eibe at cs.waikato.ac.nz (Eibe Frank)
Date: Mon Jan 17 08:56:27 2005
Subject: [Wekalist] where is KDTree?
In-Reply-To: <E1Cpxfh-0001iJ-Qg@ghoul.scms.waikato.ac.nz>
References: <E1Cpxfh-0001iJ-Qg@ghoul.scms.waikato.ac.nz>
Message-ID: <AE0D977D-67F8-11D9-A5A9-000A959DE03E@cs.waikato.ac.nz>

We are currently in the process of revising that code. However, you can 
still get the old version from the CVS repository. Information on how 
to access it is on the Weka web page.

The code might also be in one of the old releases. Those are also still 
available from Sourceforge.

Cheers,
Eibe

On Jan 16, 2005, at 12:49 PM, wekalist-request@list.scms.waikato.ac.nz 
wrote:

> Hello,
>
> There used to be KDTree (in weka.core.KDTree if I remember correctly)
> indexing in one of the previous releases of WEKA.
> Does anyone know if it has been moved somewhere or removed?
>
> Thanks,
> Mykola
>
> --
> Mykola Pechenizkiy, Ph. Lic.
> Department of Computer Science and Information Systems
> University of Jyv?skyl?
> P.O. Box 35
> 40351 Jyv?skyl?
> Finland


From bthom at cs.hmc.edu  Mon Jan 17 14:12:57 2005
From: bthom at cs.hmc.edu (belinda thom)
Date: Mon Jan 17 14:12:15 2005
Subject: [Wekalist] terminology questions: functional approximation /
	classification
In-Reply-To: <6100CC67-65CA-11D9-8F91-000A959DE03E@cs.waikato.ac.nz>
References: <E1CpETH-0002gE-A8@ghoul.scms.waikato.ac.nz>
	<6100CC67-65CA-11D9-8F91-000A959DE03E@cs.waikato.ac.nz>
Message-ID: <EBEECBAA-6824-11D9-8625-000D93ACC694@cs.hmc.edu>

Eibe,

Thanks again for your input. Now, finally, to follow up.

On Jan 13, 2005, at 5:19 PM, Eibe Frank wrote:

> Yes, the notion of classifiers in Weka is confusing. Some of them can 
> predict numeric targets, others nominal ones, and some both types.
>
> Note that this is not restricted to weka.classifiers.functions. When 
> we decided to divide the classifiers in Weka into sub packages we had 
> to decide on names, and we chose the name "functions" to refer to the 
> models produced by the algorithms in that package because people 
> generally think of them as mathematical functions. We weren't really 
> all that happy with the name but couldn't think of anything better.

I understand the issue of naming is not clear cut. A suggestion: 
perhaps just a sentence or two somewhere early on identifying 
definitively that Weka handles both numeric supervised learning and 
nominal, that they are very different (usually minimizing different 
error functions). This notice up front would've saved us about a half a 
day :-)

> Note that Weka doesn't distinguish between real or integer-valued 
> attributes. They are both "numeric".

Fair enough. Does Weka have functions for explicitly learning rankings?

> I'm not quite sure what you mean by "prior sorce forge is needed for 
> some of the regression to work properly". All methods in Weka are 
> supposed to be equally stable. If something doesn't work properly we 
> would like to hear about it, and we will endeavor to fix it, no matter 
> what algorithm in Weka has the problem. However, it is true that some 
> bug fixes are only in the CVS repository because we haven't had a 
> chance to make a new release.

Here's a list of issues that I got from my student, Aaron Arvey, who 
has been playing with Weka in order to verify that it could be used for 
numeric supervised learning. He's tried posting it to the list several 
times (and joined and confirmed) but for some reason his posts keep 
bouncing.

--------------------------------------

The issues I have had in the functional category of classification is
the following using JVM 1.4.2

java -Xmx2000m -oss2000m -classpath weka.jar
weka.classifiers.functions.MultilayerPerceptron -x 2 -t ./data/cpu.arff

causes a crash followed by a heap dump.  LinearRegression, and other
functions cause similar behavior.  However, when I do the exact same
command using JVM 1.5, everything works just dandy.  In both
circumstances I am using the Sun JVM on Linux Fedora Core 2.

Weka release 3-3-6 doesn't need the knewer JVM, it works just fine for
LinearRegression using JVM 1.4.2.  I thought this was a little weird,
but I'm no longer familiar enough with Java to go digging around in the
internals.

This seems to at least resolve the final part of this email.

Aaron

----------------------

Thanks,

--b


From eibe at cs.waikato.ac.nz  Mon Jan 17 15:18:29 2005
From: eibe at cs.waikato.ac.nz (Eibe Frank)
Date: Mon Jan 17 15:18:38 2005
Subject: [Wekalist] terminology questions: functional approximation /
	classification
In-Reply-To: <EBEECBAA-6824-11D9-8625-000D93ACC694@cs.hmc.edu>
References: <E1CpETH-0002gE-A8@ghoul.scms.waikato.ac.nz>
	<6100CC67-65CA-11D9-8F91-000A959DE03E@cs.waikato.ac.nz>
	<EBEECBAA-6824-11D9-8625-000D93ACC694@cs.hmc.edu>
Message-ID: <13A7E56E-682E-11D9-A5A9-000A959DE03E@cs.waikato.ac.nz>


On Jan 17, 2005, at 2:12 PM, belinda thom wrote:

> Fair enough. Does Weka have functions for explicitly learning rankings?

You can use the predicted class probabilities to generate a ranking. 
But there is no algorithm for learning from a ranked list of choices 
associated with every instance, if that's what you mean.

> The issues I have had in the functional category of classification is
> the following using JVM 1.4.2
>
> java -Xmx2000m -oss2000m -classpath weka.jar
> weka.classifiers.functions.MultilayerPerceptron -x 2 -t ./data/cpu.arff
>
> causes a crash followed by a heap dump.  LinearRegression, and other
> functions cause similar behavior.  However, when I do the exact same
> command using JVM 1.5, everything works just dandy.  In both
> circumstances I am using the Sun JVM on Linux Fedora Core 2.
>
> Weka release 3-3-6 doesn't need the knewer JVM, it works just fine for
> LinearRegression using JVM 1.4.2.  I thought this was a little weird,
> but I'm no longer familiar enough with Java to go digging around in the
> internals.
>
> This seems to at least resolve the final part of this email.

It sounds like the problem is a buggy JVM and not Weka.

Cheers,
Eibe


From sionep at xtra.co.nz  Mon Jan 17 16:06:38 2005
From: sionep at xtra.co.nz (sione)
Date: Mon Jan 17 16:02:37 2005
Subject: [Wekalist] Re: Principal Components Analysis algorithm
In-Reply-To: <20050116233522.EZKR560.mta5-rme.xtra.co.nz@ghoul.scms.waikato.ac.nz>
References: <20050116233522.EZKR560.mta5-rme.xtra.co.nz@ghoul.scms.waikato.ac.nz>
Message-ID: <41EB2BBE.1040107@xtra.co.nz>


>Message: 1
>Date: Sat, 15 Jan 2005 23:29:47 -0200
>From: tbassani@ppgia.pucpr.br
>Subject: [Wekalist] Principal Components Analysis algorithm 
>To: wekalist@list.scms.waikato.ac.nz
>Message-ID: <1105838986.41e9c38b007b6@wwws.ppgia.pucpr.br>
>Content-Type: text/plain; charset=ISO-8859-1
>
>Hallo,
>
>I am working with the PCA algorithm and the WEKA support only a simple version,
>to select attributes. However, I need: 
>-The matrix composed by the Principal Components Analysis,
>-The Component Scores,
>-The Component Variances. 
>
>Moreover, I need the Rotation of factor analysis or principal components
>analysis loadings. 
>
>There is plug-in that could give me some of these informations? 
>
>Any Information could be helpful.
>
>Thiago Bassani.
>  
>

Thiago,

Refer to the archive for my previous posts about PCA.

1) https://list.scms.waikato.ac.nz/pipermail/wekalist/2005-January/003244.html

2) https://list.scms.waikato.ac.nz/pipermail/wekalist/2005-January/003245.html

You could modify the constructor to take an input double[][] parameter as the 
following:

public PCA(double[][] X){
  this.rawData = new Matrix(X);
 }

Cheers,
Sione.




From s_elaoumari at yahoo.fr  Mon Jan 17 16:42:26 2005
From: s_elaoumari at yahoo.fr (Sanaa EL AOUMARI)
Date: Mon Jan 17 16:42:34 2005
Subject: [Wekalist] HillClimer Algorithm for bayesian network
In-Reply-To: <41EB2BBE.1040107@xtra.co.nz>
Message-ID: <20050117034226.88825.qmail@web60309.mail.yahoo.com>


Hello Everybody,

Does some one have more details of algorithme HillClimber implemented in the bayes classifiers-bayesnet for bayesian network, or know where can I find How this is implemented in weka. I don't find it in the DataMining book weka.

Thanks a lot,

Sanaa


		
---------------------------------
 D?couvrez le nouveau Yahoo! Mail : 250 Mo d'espace de stockage pour vos mails !
Cr?ez votre Yahoo! Mail
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050117/00a6dbb4/attachment.htm
From siva at bch.umontreal.ca  Tue Jan 18 05:07:07 2005
From: siva at bch.umontreal.ca (Sivakumar Kannan)
Date: Tue Jan 18 05:04:28 2005
Subject: [Wekalist] Newbie Questions - J4.8
Message-ID: <41EBE2AB.70605@bch.umontreal.ca>

Dear all:

I am a new member to the list and this is my first mail. First, I would 
like to thank all the wonderful people (Dr. Frank and Dr. Witten) for 
this amazing tool and the continued support through this mailing list. I 
am a biologist trying to learn machine learning (bioinformatics) so 
please bear with me if my questions are too trivial. Here are my 
questions regarding J4.8

1. I have a dataset of more than 12,000 instances and the number of 
class labels are 80. Some class labels have as less as 2 instances while 
some are in the range of 100s. Is there any limit for the minimum number 
of instances (for individual class labels) for doing 10-fold cross 
validation? When I output the "Detailed Accuracy By Class",  the class 
labels with very few instances have values "0" for the all the 
statistics measures (TP Rate, FP Rate, Precision, etc., ) and this 
affects the overall average precision or recall. Should I exclude those 
instances from the datset?

2. What exactly is resampling? If I use this during preprocessing (with 
default values), I get improved prediction accuracy.

Thanks in advance for any tips or suggestions.

Have a great day!!

Cheers,
Siva

From fracpete at waikato.ac.nz  Tue Jan 18 09:01:53 2005
From: fracpete at waikato.ac.nz (Peter Reutemann)
Date: Tue Jan 18 09:02:05 2005
Subject: [Wekalist] Problem when using Weka Explorerto open large datasets
In-Reply-To: <E030192F65406648905385147031729E49A927@mail03.student.main.ntu.edu.sg>
References: <E030192F65406648905385147031729E49A927@mail03.student.main.ntu.edu.sg>
Message-ID: <41EC19B1.9090304@waikato.ac.nz>

Hey!

A jar archive is basically a zip archive with a different extension. 
Just rename the extension to ".zip" and open it with a program capable 
of dealing with zip (built-in feature if you have WinXP, otherwise 
download a free ZIP programm, e.g. 7-Zip: http://www.7-zip.org/ or 
FreeZIP: http://members.ozemail.com.au/~nulifetv/freezip/).

Cheers, Peter

#NGUYEN VAN HIEN# wrote:

> Hi,
> Thanks for your help. I would like to ask you one more simple question.
> I tried to open the source code of Weka (file weka-src.jar) in Window,
> but it failed. Could you tell me how to open it?
> Regards
> Nguyen Van Hien
> 
> -----Original Message-----
> From: Peter Reutemann [mailto:fracpete@waikato.ac.nz] 
> Sent: Thursday, January 13, 2005 3:27 PM
> To: #NGUYEN VAN HIEN#
> Subject: Re: [Wekalist] Problem when using Weka Explorerto open large
> datasets
> 
> No need to apologize! You've only forgotten to specify the classpath 
> option for java and you should quote the path, since it contains a 
> blank. Just use this command to run your WEKA with 256MB of heap size 
> (it is actually one line, even though displayed as two):
> 
> java -Xmx256m -classpath "C:\Program Files\Weka-3-4\weka.jar" 
> weka.gui.explorer.Explorer
> 
> Cheers, Peter
> 
> #NGUYEN VAN HIEN# wrote:
> 
>>Hi,
>>Sorry for asking you one more simple question again, b/c I'm a
> 
> newcomer
> 
>>in both Weka and Java. I want to use Weka to support my research. I
>>tried to type as you said but an error message occurred:
>>My command: 
>>java C:\Program Files\Weka-3-4\weka.jar weka.gui.explorer.Explorer
>>
>>Error message:
>>Exception in thread "main" java.lang.NoClassDefFounError
>>
>>Pls tell me how to solve it
>>Thanks
>>Nguyen Van Hien
>>
>>-----Original Message-----
>>From: Peter Reutemann [mailto:fracpete@waikato.ac.nz] 
>>Sent: Thursday, January 13, 2005 11:48 AM
>>To: #NGUYEN VAN HIEN#
>>Cc: wekalist@list.scms.waikato.ac.nz
>>Subject: Re: [Wekalist] Problem when using Weka Explorerto open large
>>datasets
>>
>>Hey!
>>
>>With large datasets it can happen that the default heap size of the 
>>virtual machine is not enough. You can increase the maximum heap size 
>>with the "-Xmx" parameter. You can e.g. start the Explorer with 256MB
> 
> as
> 
>>heap size:
>>    java -Xmx256m -classpath weka.jar weka.gui.explorer.Explorer
>>
>>Cheers, Peter
>>
>>#NGUYEN VAN HIEN# wrote:
>>
>>
>>>Hi all,
>>>
>>>If I use Weka Explorer to open the continuous breastCancer datset
>>
>>(24481 
>>
>>
>>>attributes, 97 instances) by a 512M computer, if fails with an error 
>>>message that is like "out of memory". My weka has not been done any 
>>>reconfiguration. I would like to know how to solve the problem.
>>>
>>>
>>>
>>>Thanks
>>>
>>>Nguyen Van Hien
>>>
>>>
>>>
>>
>>
> ------------------------------------------------------------------------
> 
>>>_______________________________________________
>>>Wekalist mailing list
>>>Wekalist@list.scms.waikato.ac.nz
>>>https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>
>>
> 

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato
Phone +64 (7) 838-4466 Ext. 8766

From euan.adie at ed.ac.uk  Wed Jan 19 00:09:03 2005
From: euan.adie at ed.ac.uk (Euan Adie)
Date: Wed Jan 19 00:08:48 2005
Subject: [Wekalist] Decision Trees and correlated variables
Message-ID: <41ECEE4F.6010000@ed.ac.uk>

Hi,

I've been using alternating decision trees to solve a bioinformatics 
classification problem. Weka is a fantastic tool - thanks again to those 
involved in maintaining it.

Anyway, I've got a quick question. My feature set is made up of 
variables which sometimes correlate quite highly with one another (say, 
5 out of 20 of them aren't independent). I know that this wouldn't be a 
good thing in a Bayesian type approach, but will it make any difference 
(i.e. skew classification unduly) to a decision tree? Are there any 
issues I should be concerned about?

Thanks for your help,
Euan

---
Euan Adie
Medical Genetics Section, University of Edinburgh
MMC, Western General Hospital
EH42XU Edinburgh
Scotland, UK

From tokugawa98 at gmx.net  Wed Jan 19 00:55:48 2005
From: tokugawa98 at gmx.net (Albrecht Zimmermann)
Date: Wed Jan 19 00:55:58 2005
Subject: [Wekalist] Decision Trees and correlated variables
In-Reply-To: <41ECEE4F.6010000@ed.ac.uk>
References: <41ECEE4F.6010000@ed.ac.uk>
Message-ID: <41ECF944.5080908@gmx.net>

Hi,

normally it shouldn't be much of a problem since correlated variables 
would lose discriminating power once a variable they correlate with has 
been picked as test attribute.

                            G, A

Euan Adie wrote:
> Hi,
> 
> I've been using alternating decision trees to solve a bioinformatics 
> classification problem. Weka is a fantastic tool - thanks again to those 
> involved in maintaining it.
> 
> Anyway, I've got a quick question. My feature set is made up of 
> variables which sometimes correlate quite highly with one another (say, 
> 5 out of 20 of them aren't independent). I know that this wouldn't be a 
> good thing in a Bayesian type approach, but will it make any difference 
> (i.e. skew classification unduly) to a decision tree? Are there any 
> issues I should be concerned about?
> 
> Thanks for your help,
> Euan
> 
> ---
> Euan Adie
> Medical Genetics Section, University of Edinburgh
> MMC, Western General Hospital
> EH42XU Edinburgh
> Scotland, UK
> 
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> 
> 

-- 
Albrecht Zimmermann
Machine Learning and Natural Language Processing Lab
Institute for Computer Science
Albert-Ludwigs-University Freiburg
Georges-Koehler-Allee 79
79110 Freiburg
Germany
phone: +49-761-203-8012
fax:   +49-761-203-8007

From luciana.bucene at agr.unicamp.br  Wed Jan 19 05:08:23 2005
From: luciana.bucene at agr.unicamp.br (Luciana Corpas Bucene)
Date: Wed Jan 19 05:08:31 2005
Subject: [Wekalist] problems format date in arff
Message-ID: <1286.200.0.70.148.1106064503.squirrel@new.host.name>

Hi,
I seem to be having problems properly defining the format of a date field
in my .arff file. My Weka is version 3-4.
My .arff file is:
@atribute name date "yyyy-mm-dd"
and my date are:
{1964-10-17,1964-10-18,1964-10-19,1964-10-20,1964-10-21,1964-10-22,1964-10-23,1964-10-24,1964-10-25,1964-10-26,1964-10-27,1964-10-28,1964-10-29,1964-10-30,1964-10-31,1964-11-01,1964-11-02,1964-11-03,1964-11-04,1964-11-05,1964-11-06,1964-11-07,1964-11-08,1964-11-09,1964-11-10,1964-11-11,1964-11-12,1964-11-13,1964-11-14,1964-11-15,1964-11-16,1964-11-17,1964-11-18).
Help-me.
Thank you.
Lu.




From fracpete at waikato.ac.nz  Wed Jan 19 08:57:50 2005
From: fracpete at waikato.ac.nz (Peter Reutemann)
Date: Wed Jan 19 08:57:55 2005
Subject: [Wekalist] problems format date in arff
In-Reply-To: <1286.200.0.70.148.1106064503.squirrel@new.host.name>
References: <1286.200.0.70.148.1106064503.squirrel@new.host.name>
Message-ID: <41ED6A3E.10009@waikato.ac.nz>

Hey!

The correct date format string would be with capital "M" (the lower case 
"m" is 'Minute' and not 'Month'), i.e. your definition should look like 
this:
	@attribute name date "yyyy-MM-dd"

You can find the different kinds of format strings in the Javadoc of the 
class "java.text.SimpleDateFormat":
http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html

And some examples of attribute definitions can be found here on the WEKA 
homepage:
http://www.cs.waikato.ac.nz/~ml/weka/arff.html

BTW You spelled attribute with only one "t", which might have resulted 
in an error message, too.

Cheers, Peter

Luciana Corpas Bucene wrote:

> Hi,
> I seem to be having problems properly defining the format of a date field
> in my .arff file. My Weka is version 3-4.
> My .arff file is:
> @atribute name date "yyyy-mm-dd"
> and my date are:
> {1964-10-17,1964-10-18,1964-10-19,1964-10-20,1964-10-21,1964-10-22,1964-10-23,1964-10-24,1964-10-25,1964-10-26,1964-10-27,1964-10-28,1964-10-29,1964-10-30,1964-10-31,1964-11-01,1964-11-02,1964-11-03,1964-11-04,1964-11-05,1964-11-06,1964-11-07,1964-11-08,1964-11-09,1964-11-10,1964-11-11,1964-11-12,1964-11-13,1964-11-14,1964-11-15,1964-11-16,1964-11-17,1964-11-18).
> Help-me.
> Thank you.
> Lu.
> 
> 
> 
> 
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> 
> 

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato
Phone +64 (7) 838-4466 Ext. 8766

From luciana.bucene at agr.unicamp.br  Thu Jan 20 02:26:18 2005
From: luciana.bucene at agr.unicamp.br (Luciana Corpas Bucene)
Date: Thu Jan 20 02:26:25 2005
Subject: [Wekalist] size weka
Message-ID: <1113.200.0.70.148.1106141178.squirrel@new.host.name>


Hi,
Which the maximum size of the matrix that the Weka open?
Thank you
Lu



From fracpete at waikato.ac.nz  Thu Jan 20 13:57:45 2005
From: fracpete at waikato.ac.nz (Peter Reutemann)
Date: Thu Jan 20 13:57:51 2005
Subject: [Wekalist] problems format date in arff
In-Reply-To: <1405.200.0.70.148.1106131098.squirrel@new.host.name>
References: <1286.200.0.70.148.1106064503.squirrel@new.host.name>
	<41ED6A3E.10009@waikato.ac.nz>
	<1405.200.0.70.148.1106131098.squirrel@new.host.name>
Message-ID: <41EF0209.8030308@waikato.ac.nz>

Luciana,

WEKA is reading the file correctly (I assume you do this with the 
Explorer), otherwise you wouldn't see the min/max/etc. of the date 
attribute. Date attributes are internally handled as doubles, which is 
why you get the min/max/etc. The "Unknown" only showed up due to a minor 
display bug in the Explorer (the fixed source is now in the CVS, i.e. 
"Date" is now displayed if it is indeed a date attribute).
Since you're new to data mining: if the date attribute is your class 
attribute you will need a classifier that can handle numeric attributes, 
like e.g. "weka.classifiers.trees.M5P".

Hope that helps!

Cheers, Peter

Luciana Corpas Bucene wrote:

> Peter,
> I'm new in Weka and I'm new in Data Mining too.
> I did what you recommended, however, when I open the file in the weka, the
> kind of data stayed like "Unknown" and give minimum, maximum, mean and
> StdDev of the data of date.  Being like this, I believe that the Weka is
> not reading the format right.  It is certain the kind of the data of date
> they will be "Unkown"?
> Thank you and I await.
> Luciana.
> 
> 
>>Hey!
>>
>>The correct date format string would be with capital "M" (the lower case
>>"m" is 'Minute' and not 'Month'), i.e. your definition should look like
>>this:
>>	@attribute name date "yyyy-MM-dd"
>>
>>You can find the different kinds of format strings in the Javadoc of the
>>class "java.text.SimpleDateFormat":
>>http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html
>>
>>And some examples of attribute definitions can be found here on the WEKA
>>homepage:
>>http://www.cs.waikato.ac.nz/~ml/weka/arff.html
>>
>>BTW You spelled attribute with only one "t", which might have resulted
>>in an error message, too.
>>
>>Cheers, Peter
>>
>>Luciana Corpas Bucene wrote:
>>
>>
>>>Hi,
>>>I seem to be having problems properly defining the format of a date
>>>field
>>>in my .arff file. My Weka is version 3-4.
>>>My .arff file is:
>>>@atribute name date "yyyy-mm-dd"
>>>and my date are:
>>>{1964-10-17,1964-10-18,1964-10-19,1964-10-20,1964-10-21,1964-10-22,1964-10-23,1964-10-24,1964-10-25,1964-10-26,1964-10-27,1964-10-28,1964-10-29,1964-10-30,1964-10-31,1964-11-01,1964-11-02,1964-11-03,1964-11-04,1964-11-05,1964-11-06,1964-11-07,1964-11-08,1964-11-09,1964-11-10,1964-11-11,1964-11-12,1964-11-13,1964-11-14,1964-11-15,1964-11-16,1964-11-17,1964-11-18).
>>>Help-me.
>>>Thank you.
>>>Lu.
>>>
>>>
>>>
>>>
>>>_______________________________________________
>>>Wekalist mailing list
>>>Wekalist@list.scms.waikato.ac.nz
>>>https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>
>>>
>>
>>--
>>Peter Reutemann, Dept. of Computer Science, University of Waikato
>>Phone +64 (7) 838-4466 Ext. 8766
>>
> 
> 
> 

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato
Phone +64 (7) 838-4466 Ext. 8766

From fracpete at waikato.ac.nz  Thu Jan 20 14:06:14 2005
From: fracpete at waikato.ac.nz (Peter Reutemann)
Date: Thu Jan 20 14:06:18 2005
Subject: [Wekalist] size weka
In-Reply-To: <1113.200.0.70.148.1106141178.squirrel@new.host.name>
References: <1113.200.0.70.148.1106141178.squirrel@new.host.name>
Message-ID: <41EF0406.6000206@waikato.ac.nz>

Hey!

The amount of data WEKA can load is limited by memory and how much heap 
size the java virtual machine has. How to set the heap size can be found 
e.g. here on the WEKA list:
https://list.scms.waikato.ac.nz/mailman/htdig/wekalist/2001-December/000062.html

You can search the Wekalist Archive here:
https://list.scms.waikato.ac.nz/mailman/htdig/wekalist/

Cheers, Peter

Luciana Corpas Bucene wrote:

> Hi,
> Which the maximum size of the matrix that the Weka open?
> Thank you
> Lu
> 
> 
> 
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> 
> 

-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato
Phone +64 (7) 838-4466 Ext. 8766

From EricAltendorf at orst.edu  Thu Jan 20 21:58:13 2005
From: EricAltendorf at orst.edu (Eric Altendorf)
Date: Thu Jan 20 22:06:04 2005
Subject: [Wekalist] Help with subclassing BayesNet
Message-ID: <200501200058.14306.EricAltendorf@orst.edu>


Hello all,

I have an implementation of a constrained estimator for CPTs which I 
would like to integrate in the Weka framework for ease of running 
comparative experiments.  I am therefore trying to understand the 
classes relating to Bayes nets in Weka.  My needs are to 
subclass/override:

1) The loading of BIF files.  My implementation makes use of an 
extended BIF format which contains additional information specifying 
various qualitative constraints.

2) The estimation of CPT parameters from data.  My implementation 
takes the qualitative constraints from the BIF file and converts them 
to numeric constraints on CPT cells, the values in which I then 
estimate by constrained maximum likelihood given the total counts.

I do not need any structure learning algorithms (we always specify the 
network), and I do not need to specify any prior (we always assume a 
uniform dirichlet).  This should make things relatively simple.  
However, I'm still not sure where to start.  

I've been looking at the BayesNetEstimator class, but not sure if I 
should extend it, or any of its subclasses.  I'm also not sure at 
which point I should run my constrained MLE -- after all counts are 
collected, but before anyone asks for CPT parameters.  Finally, I'm 
not sure where to put my BIF loading code, and how to specify the use 
of it rather than something else.  Is it sufficient to provide a new 
implementation of BayesNetEstimator, or do I need a new subclass of 
BayesNet as well that will call the proper XML BIF loader?

Thank you all in advance,

-- 
Eric Altendorf    
Oregon State University

From weka at gijs.triple-it.nl  Thu Jan 20 23:32:08 2005
From: weka at gijs.triple-it.nl (Gijs Zonneveld)
Date: Thu Jan 20 23:33:42 2005
Subject: [Wekalist] instanced-based learning with WEKA
Message-ID: <41EF88A8.5040302@gijs.triple-it.nl>

Hi all,

Maybe one of you can help me with the following, because I could not
figure it out in the past 2 days.

I would like to do instance based learning with the IBk algorithm.
As I know globally know how WEKA works by doing the tutorials I now want
to use WEKA for my own analysis.

To create an easy example, I used the weather file to create my own example.
I created a "training set" and a "test set".
Now I want to have the items of the "test set" to be classified "like"
the training set.

When I do it as I think it should be done, I get the results below.

Can somebody explain how to get this done?

Thanks,

Gijs


-------------------
=== Run information ===

Scheme:       weka.classifiers.lazy.IBk -K 1 -W 0
Relation:     weather
Instances:    10
Attributes:   5
              outlook
              temperature
              humidity
              windy
              play
Test mode:    user supplied test set: 4 instances

=== Classifier model (full training set) ===

IB1 instance-based classifier
using 1 nearest neighbour(s) for classification


Time taken to build model: 0 seconds

=== Evaluation on test set ===
=== Summary ===

Total Number of Instances                0

=== Detailed Accuracy By Class ===

TP Rate   FP Rate   Precision   Recall  F-Measure   Class
  0         0          0         0         0        yes
  0         0          0         0         0        no

=== Confusion Matrix ===

 a b   <-- classified as
 0 0 | a = yes
 0 0 | b = no

--------------------------------------



The training file is:
---------------------
@relation weather

@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes




The test file is:
-------------------------
@relation weather

@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data

rainy,68,80,FALSE,?
rainy,65,70,TRUE,?
overcast,64,65,TRUE,?
sunny,72,95,FALSE,?
sunny,69,70,FALSE,?
rainy,75,80,FALSE,?
sunny,75,70,TRUE,?
overcast,72,90,TRUE,?
overcast,81,75,FALSE,?
rainy,71,91,TRUE,?




From THOMAS.C.JOHNSON at saic.com  Fri Jan 21 05:40:15 2005
From: THOMAS.C.JOHNSON at saic.com (Johnson, Thomas C.)
Date: Fri Jan 21 05:40:38 2005
Subject: [Wekalist] J48 Decision Tree Output
Message-ID: <71BD1AD1D6DAF5488C97EF358746AAA406EEDB@nebula.apd.saic.com>

I thought I understood this, but now I'm not sure.  Something's not adding
up!

I used the command line with separate training and test files.  The training
file had about twice as many instances as the test file.

My decision tree output has a line that looks like

|   |   |   |   |   ATTR_34 = 0000: T (173.14/10.3)


What exactly do the numbers in parentheses mean? 

Thanks,
--TcJ

From jason.brownlee at internode.on.net  Fri Jan 21 10:06:05 2005
From: jason.brownlee at internode.on.net (Jason Brownlee)
Date: Fri Jan 21 10:06:31 2005
Subject: [Wekalist] AIRS for WEKA
Message-ID: <016301c4ff33$db2ab280$0200a8c0@Jason>

All,



I've prepared a version of the AIRS (Artificial Immune Recognition System) classification algorithm for WEKA. The implementation contains three version of the technique, including AIRS1, AIRS2 and Parallel AIRS. Please see http://www.it.swin.edu.au/centres/ciscp/ais/ for further details.


Sincerely, 


Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050121/9c132e13/attachment.htm
From rjsteckel at impactsci.com  Fri Jan 21 10:42:10 2005
From: rjsteckel at impactsci.com (Ryan Steckel)
Date: Fri Jan 21 10:42:23 2005
Subject: [Wekalist] High number of classes in ARFF
Message-ID: <AF535237618D884BAE36DF8B0437B97908564E@benhur.denver.impactsci.com>

I'm trying to build a classifier. I have a dataset with around 260K
instances. Each instance has 4 attributes. The forth attribute is the
class. But there are about 560 different classes. Will I have to list
all 560 different classes in the attribute section of the ARFF, or is
there a better way to do it?
 
Thanks,
Ryan Steckel


From euan.adie at ed.ac.uk  Sun Jan 23 01:54:03 2005
From: euan.adie at ed.ac.uk (Euan Adie)
Date: Sun Jan 23 01:53:49 2005
Subject: [Wekalist] Rough sets
Message-ID: <41F24CEB.6000800@ed.ac.uk>

Hi,

Has anybody done any work with rough sets using Weka as a framework? 
(and ideally: has anybody got any code they'd be willing to share? :) )

A quick search of the mailing list showed that Ola Leifler mentioned 
something a while back, but the URLs in the papers he refers to are now 
defunct.

Cheers,
Euan

From ydong at fau.edu  Sun Jan 23 07:24:42 2005
From: ydong at fau.edu (Yuhong Dong)
Date: Sun Jan 23 07:24:50 2005
Subject: [Wekalist] help on sequence data clustering
Message-ID: <6215897.1106418282772.JavaMail.ydong@fau.edu>

 Hi guys,

I just a beginner to use weka. I hope to use WEKA to find anomaly 
sequence with unsupervise method, such as some clustering algorithms. 

I have sequence learning data with format like :
sequence 1 : A B C D E F
sequence 2 : A B C H
sequence 3 : A B D H F H D H

Could somebody help to recommend some method , which can be used in 
this cases? 

Thanks all,
Alice Dong

From eibe at cs.waikato.ac.nz  Mon Jan 24 09:21:56 2005
From: eibe at cs.waikato.ac.nz (Eibe Frank)
Date: Mon Jan 24 09:22:03 2005
Subject: [Wekalist] instanced-based learning with WEKA
In-Reply-To: <E1Crleh-0003fL-Ro@ghoul.scms.waikato.ac.nz>
References: <E1Crleh-0003fL-Ro@ghoul.scms.waikato.ac.nz>
Message-ID: <6D405C4D-6D7C-11D9-918E-000A959DE03E@cs.waikato.ac.nz>

In the Explorer, check "Output predictions..." (note that the actual 
text might be slightly different) under "More Options...".

 From the command line you can use the -p option.

Cheers,
Eibe

On Jan 21, 2005, at 12:24 PM, wekalist-request@list.scms.waikato.ac.nz 
wrote:

> Hi all,
>
> Maybe one of you can help me with the following, because I could not
> figure it out in the past 2 days.
>
> I would like to do instance based learning with the IBk algorithm.
> As I know globally know how WEKA works by doing the tutorials I now 
> want
> to use WEKA for my own analysis.
>
> To create an easy example, I used the weather file to create my own 
> example.
> I created a "training set" and a "test set".
> Now I want to have the items of the "test set" to be classified "like"
> the training set.
>
> When I do it as I think it should be done, I get the results below.
>
> Can somebody explain how to get this done?
>
> Thanks,
>
> Gijs


From david at dsc.ufcg.edu.br  Wed Jan 26 02:21:45 2005
From: david at dsc.ufcg.edu.br (David Moises B. dos Santos / Pos. COPIN)
Date: Wed Jan 26 02:21:58 2005
Subject: [Wekalist] sampling data
Message-ID: <20050125132035.M42406@dsc.ufcg.edu.br>

Hi,

  Are techniques for sampling data there in WEKA?If yes, which are the 
techniques?

  Thanks,

  David

From david at dsc.ufcg.edu.br  Wed Jan 26 08:03:54 2005
From: david at dsc.ufcg.edu.br (David Moises B. dos Santos / Pos. COPIN)
Date: Wed Jan 26 08:04:01 2005
Subject: [Wekalist] quality of induced knowledge
Message-ID: <20050125185208.M15286@dsc.ufcg.edu.br>

Hi,


   Do anyone know works that treat pos-processing step of the KDD process? 
I'm interesting in works that treat the quality of induced knowledge (e.g., 
readability and conciseness of the decision trees), but that they do not use 
statistical measures . 

   Thanks,


   David

From terry at letsche.net  Wed Jan 26 10:20:12 2005
From: terry at letsche.net (Terry Letsche)
Date: Wed Jan 26 10:20:24 2005
Subject: [Wekalist] Command-line question
Message-ID: <1106688012.7733.6.camel@localhost.localdomain>

Hi.

Normally, I use the Weka Explorer when evaluating data to select
attributes. How can I do this from the command line, parameter-wise?

This is the kind of operation I'm doing quite a bit, with 10-fold cross-
validation. If I can do this from the command line I can use a parallel
cross-validation routine (weka grid project).

=== Run information ===

Evaluator:    weka.attributeSelection.CfsSubsetEval
Search:       weka.attributeSelection.GeneticSearch -Z 100 -G 20 -C 0.6
-M 0.033 -R 20 -S 1
Relation:     WT_D1_withoutTS-CI-h-.1.csv-
weka.filters.unsupervised.attribute.Remove-R1
Instances:    32768
Attributes:   12
              AIR_TEMP_FROM_PREHEAT_COIL_1
              AIR_TEMP_FROM_PREHEAT_COIL_2
              AIR_HTR_9A_GAS_OUT_PRESS
              AIR_HTR_9B_GAS_OUT_PRESS
              CONDENSATE_FLOW
              SELECTED_9_BLR_FEEDWATER_F
              RHTR_ATTEMP_WATER_FLOW
              SHTR_ATTEMP_WATER_FLOW
              BLR_FDW_HEADER_PRESS
              BURNER_TILT_DEGREES
              FEEDER_9ALL_SPEED
              BE_D2
Evaluation mode:    10-fold cross-validation

I'm guessing that from the command line I'd basically pass the calls to
the various java methods, but I'm not sure how to put it all together.

Thank you very much.

Terry Letsche
-- 
Terry Letsche <terry@letsche.net>


From jsw_tom at tom.com  Fri Jan 28 02:20:04 2005
From: jsw_tom at tom.com (=?gb2312?B?va/LvM6w?=)
Date: Fri Jan 28 02:21:45 2005
Subject: [Wekalist] breakpoint in jbuider with Weka
Message-ID: <mailman.1.1106832105.12108.wekalist@list.scms.waikato.ac.nz>

RGVhciBhbGw6DQoNCiAgICBJIGp1c3QgYSBiZWdpbm5lciB0byB1c2Ugd2VrYS4NCiAgICANCiAg
ICBJIHVzZSB3ZWFrIGluIEpCdWlsZGVyIDkuIFRoZSBSdW50aW1lIENvbmZpZ3VyYXRpb24gUHJv
cGVydGllcyBpcyB3ZWthLmd1aS5HVUlDaG9vc2VyLg0KDQogICAgd2hlbiBpIHNldCBicmVha3Bv
aW50IGluIE5haXZlQmF5ZXMuamF2YSAsdGhlbiBydW4gIkRlYnVnIHByb2plY3QiLCB0aGVyZSB3
aWxsIGJlIGEgbWVzc2FnZSA6ICAgDQogLS0gQ2Fubm90IGNvbmZpZ3VyZSBKYXZhIGRlYnVnIHBy
b2Nlc3MgYXJndW1lbnRzIC0tDQoNCmNvbS5zdW4uamRpLmNvbm5lY3QuSWxsZWdhbENvbm5lY3Rv
ckFyZ3VtZW50c0V4Y2VwdGlvbjogTm90IGxpc3RlbmluZw0KICAgICAJDQoNCqGhoaGhoaGhoaGh
oaGhoaENCiAJCQkJDQoNCiBUaGFua3MsDQoNCiBqaWFuZ3Npd2VpDQqhoaGhoaGhoaGhoaGhoaGh
anN3X3RvbUB0b20uY29tDQqhoaGhoaGhoaGhoaGhoaGhoaGhoTIwMDUtMDEtMjcNCg==



From THOMAS.C.JOHNSON at saic.com  Fri Jan 28 04:22:40 2005
From: THOMAS.C.JOHNSON at saic.com (Johnson, Thomas C.)
Date: Fri Jan 28 04:22:57 2005
Subject: [Wekalist] Nominals with large numbers of values
Message-ID: <71BD1AD1D6DAF5488C97EF358746AAA406EF20@nebula.apd.saic.com>

Hello,

I am trying to build a model with a data set that includes nominals with
large number of values.... several attributes have thousands of possible
values, and one that is of great interest has 186,000 values.  Not
surprisingly, Weka runs out of memory unless a very few instances are used.
Are there any suggestions for how to handle data sets with large numbers of
possible values for nominals?

I'm already using -Xmx1850M to reserve the max heap space I can.  I have 72
attributes to select from, although I'm only using 40 of them.  The rest
have large numbers of possible values, ranging from hundreds to hundreds of
thousands.

--TcJ

From tang.lei.hawk at gmail.com  Fri Jan 28 19:01:34 2005
From: tang.lei.hawk at gmail.com (lei tang)
Date: Fri Jan 28 19:01:40 2005
Subject: [Wekalist] How to modify the data of Instances?
Message-ID: <3c8b2ce405012722017a61526a@mail.gmail.com>

Hi,sorry to bother you all!
Just a simple problem : 

I have read the *.arff file into the memory, and get the instances.
Then, I want to change the instances a little, like changing the
original mutiple class problem into binary-class. So I need to change
the class labels as well as the attribute values of  the class index.
I am just wondering how to do this in weka.

Thanks!

Lei

From anjiyuan at yahoo.com  Sat Jan 29 02:03:45 2005
From: anjiyuan at yahoo.com (ji an)
Date: Sat Jan 29 02:03:52 2005
Subject: [Wekalist] confidenceFactor
Message-ID: <20050128130345.23436.qmail@web51907.mail.yahoo.com>

Dear all,

Could any one tell me what mean of confidenceFact in
J48 is?
Its default value is 0.25.

Thanks

anj


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From dbueno at gmail.com  Sat Jan 29 05:55:39 2005
From: dbueno at gmail.com (Denis Bueno)
Date: Sat Jan 29 05:55:46 2005
Subject: [Wekalist] How to modify the data of Instances?
In-Reply-To: <3c8b2ce405012722017a61526a@mail.gmail.com>
References: <3c8b2ce405012722017a61526a@mail.gmail.com>
Message-ID: <6dbd4d000501280855f387369@mail.gmail.com>

You can do this using one of the built in filters. I think it's called
`NominalToBinary' or something like that.


On Thu, 27 Jan 2005 23:01:34 -0700, lei tang <tang.lei.hawk@gmail.com> wrote:
> Hi,sorry to bother you all!
> Just a simple problem :
> 
> I have read the *.arff file into the memory, and get the instances.
> Then, I want to change the instances a little, like changing the
> original mutiple class problem into binary-class. So I need to change
> the class labels as well as the attribute values of  the class index.
> I am just wondering how to do this in weka.
> 
> Thanks!
> 
> Lei
> 
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> 


-- 
Denis Bueno

From tang.lei.hawk at gmail.com  Sat Jan 29 06:42:39 2005
From: tang.lei.hawk at gmail.com (lei tang)
Date: Sat Jan 29 06:42:44 2005
Subject: [Wekalist] sampling data
In-Reply-To: <20050125132035.M42406@dsc.ufcg.edu.br>
References: <20050125132035.M42406@dsc.ufcg.edu.br>
Message-ID: <3c8b2ce405012809425e2dd7b5@mail.gmail.com>

You can check resample or resampleWithWeights in Instances class.

Lei


On Tue, 25 Jan 2005 10:21:45 -0300, David Moises B. dos Santos / Pos.
COPIN <david@dsc.ufcg.edu.br> wrote:
> Hi,
> 
>   Are techniques for sampling data there in WEKA?If yes, which are the
> techniques?
> 
>   Thanks,
> 
>   David
> 
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>

From donmiguelgt78 at yahoo.es  Sat Jan 29 09:04:24 2005
From: donmiguelgt78 at yahoo.es (lolo gt)
Date: Sat Jan 29 09:04:32 2005
Subject: [Wekalist] about FOCUS-SF
Message-ID: <20050128200424.26962.qmail@web25209.mail.ukl.yahoo.com>

Hi all,

I would like tu implement the FOCUS-SF (see [1]).
FOCUS-SF replaces exhaustive search with sequential
search; so I would like to know if the following code
is the FOCUS-FS:


ASSearch m_asSearch = new FOCUSSearch();
ASEvaluation m_asEvaluation =  new 
                    ConsistencySubsetEval();


Thanks you in advance!!!!

Cheers!!


[1] Efficient Feature Selection via Analysis of
Relevance and Redundancy. Lei Yu and Huan Liu. 
Journal of Machine learning Research, vol. 5,
pages 1205-1224, 2005.      






		
______________________________________________ 
Renovamos el Correo Yahoo!: ?250 MB GRATIS! 
Nuevos servicios, m?s seguridad 
http://correo.yahoo.es

From beamt at cs.pitt.edu  Sat Jan 29 10:08:51 2005
From: beamt at cs.pitt.edu (Beatriz Maeireizo Tokeshi)
Date: Sat Jan 29 10:08:56 2005
Subject: [Wekalist] Correlation of attributes 
Message-ID: <Pine.LNX.4.44.0501281607080.4201-100000@selenium.cs.pitt.edu>

Hi!
I need to find the correlation of each pair of attributes of my arff file.
How can I do it? Does Weka have any function in command line to compute
it?
Thank you.
Beatriz



From abendav at netvision.net.il  Sun Jan 30 02:33:21 2005
From: abendav at netvision.net.il (Dr. Arie Ben David)
Date: Sun Jan 30 02:32:22 2005
Subject: [Wekalist] Subjective Judgement Dadabase Wanted
Message-ID: <000601c50607$1c82f790$a20784d9@asus>

Dear everyone
We are conducting a research in ML and need more data sets of the sort which is not common in the regular ML/DM repositories such as UCI.
We need databases which include subjective real human decisions where some (possibly all) the attribute values and in particular the class values are human judgements (i.e., subjective decisions). For example, lecturer evaluation by the end of a course where students grade both the lecturer relevant attributes as well as their final score about him or her, car (or other product) preference, credit application decisions, employee hiring decisions, to name just a few.
Typically there will be not too many attributes as people tend to simplify their subjective decisions and that's OK. 
There may be categorial, ordinal or numeric attribute values (not all numeric if possible).
The data sets are expected to be very noisy and that's perfectly OK too. 
The data sets may iclude decisions of one or many persons but they have to be real data sets (not artificial).
Data sets with at least 500 full records are preferable.
We have looked at the UCI repository and it seems that only two data sets have the above features out of the 120 or so: #25- Credit card application by Quinlan and #118 -the Car Evaluation. Those other data sets from WEKA links lack the required documentation so we are not sure what they represent in the first place.
Should you have or can get any such data set that you can donate and/or if you are aware of other data sets in public repositories with these properties - we will appreciate the data/pointers very much. 
Proper acknowledgements are guaranteed and if you also want to contribute to actual reseach efforts you will be gladly added as an authors of any resulting paper.
Thank you
Dr. Arie Ben David
abendav@netvision.net.il
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050129/4bbd8285/attachment.htm
From efuzzyone at netscape.net  Sun Jan 30 08:48:42 2005
From: efuzzyone at netscape.net (Surendra Singhi)
Date: Sun Jan 30 08:49:03 2005
Subject: [Wekalist] Re: Newbie Questions - J4.8
In-Reply-To: <41EBE2AB.70605@bch.umontreal.ca>
References: <41EBE2AB.70605@bch.umontreal.ca>
Message-ID: <ctgpao$c28$1@sea.gmane.org>

Sivakumar Kannan wrote:
> Dear all:
> 
> I am a new member to the list and this is my first mail. First, I would 
> like to thank all the wonderful people (Dr. Frank and Dr. Witten) for 
> this amazing tool and the continued support through this mailing list. I 
> am a biologist trying to learn machine learning (bioinformatics) so 
> please bear with me if my questions are too trivial. Here are my 
> questions regarding J4.8
> 
> 1. I have a dataset of more than 12,000 instances and the number of 
> class labels are 80. Some class labels have as less as 2 instances while 
> some are in the range of 100s. Is there any limit for the minimum number 
> of instances (for individual class labels) for doing 10-fold cross 
> validation? When I output the "Detailed Accuracy By Class",  the class 
> labels with very few instances have values "0" for the all the 
> statistics measures (TP Rate, FP Rate, Precision, etc., ) and this 
> affects the overall average precision or recall. Should I exclude those 
> instances from the datset?

When you do 10-fold crossvalidation, the data is first stratified( that 
is ordered based upon the class labels), and then divided into 10 
partitions, such that each partition get s almost equal number of 
instances from each class. Now in your dataset if for a class label 
there are less than 10 instances, then some of the folds will have an 
instance from that class and some others won't.

Now because there are so few instances for some class labels, the 
classifier may not predict any test instance as belonging to that class. 
And as a result 0 TP Rate, FP rate, Precision and Recall  for some 
classes. Look into the confusion matrix, and this may help you in 
interpreting the results better.


> 
> 2. What exactly is resampling? If I use this during preprocessing (with 
> default values), I get improved prediction accuracy.
Resampling means re picking the data according to a given distribution 
from the population.
In Weka the "resample" function just randomly picks instances from the 
dataset, and creates a new dataset which has same number of instances as 
the original dataset. In the new dataset an instance from the original 
dataset maybe present more than once.
The reason why it might be giving you improved accuracy is because 
during resampling the classes which have only few instances may not be 
present in the new dataset and thus better prediction accuracy.

code for resample:

public Instances resample(Random random) {

     Instances newData = new Instances(this, numInstances());
     while (newData.numInstances() < numInstances()) {
       newData.add(instance(random.nextInt(numInstances())));
     }
     return newData;
   }

If you know Java then explore the Weka code using any Java IDE (Eclipse 
suggested). This will help you in getting precise answers to all your 
questions.

-- 
Surendra Singhi

www.public.asu.edu/~sksinghi/


From jason.brownlee at internode.on.net  Sun Jan 30 10:15:48 2005
From: jason.brownlee at internode.on.net (Jason Brownlee)
Date: Sun Jan 30 10:15:53 2005
Subject: [Wekalist] AIRS for WEKA
References: <Pine.GSO.4.21.0501251807500.12060-100000@caolho.dca.fee.unicamp.br>
Message-ID: <002801c50647$b4070210$0200a8c0@Jason>

Andre,

Unfortunately, my adviser (I'm a PhD student) has told me to make the site 
only visible to my group until he has had a chance to read my technical 
reports. Sorry. The good news is there is now two more WEKA implementations 
on the site these include: WEKA implementation of CLONALG and a WEKA 
implementation of Immunos-81. These are two other artificial immune system 
based classifiers that have been shown to perform well.

I will let this list know as soon as I'm allowed to open the reports & 
software up again, should be within a week or two.

Sincerely,

Jason


----- Original Message ----- 
From: "Andre Luis V. Coelho" <coelho@dca.fee.unicamp.br>
To: "Jason Brownlee" <jason.brownlee@internode.on.net>
Sent: Wednesday, January 26, 2005 7:16 AM
Subject: Re: [Wekalist] AIRS for WEKA


> Hi, Jason.
> Last sunday, I downloaded the AIRS package you have made
> available. However, I stored it in a flash disk of a friend
> of mine and he is now out of town. Trying to download it again, it seems
> that the link is now broken. Could you provide me an alternative URL to
> the package? If it is not possible, could you send me the AIRS java
> package by this e-mail account?
> Thanks in advance,
> PS: perhaps other people from weka lists you'd like to be aware of the
> new location for your package as well.
> ______________________________________________________________________
>   Eng. Dr. Andre L. V. Coelho       <coelho@dca.fee.unicamp.br>
>   Phone (home): +55-85-32344070              Mobile: +55-85-99417851
>      http://www.dca.fee.unicamp.br/~coelho
> ______________________________________________________________________
>
> On Fri, 21 Jan 2005, Jason Brownlee wrote:
>
>> All,
>>
>>
>>
>> I've prepared a version of the AIRS (Artificial Immune Recognition
> System) classification algorithm for WEKA. The implementation contains
> three version of the technique, including AIRS1, AIRS2 and Parallel
> AIRS. Please see http://www.it.swin.edu.au/centres/ciscp/ais/ for further
> details.
>>
>>
>> Sincerely,
>>
>>
>> Jason
>
> 


From kasiopi at hotmail.com  Mon Jan 31 00:21:45 2005
From: kasiopi at hotmail.com (Maria Tsiakmaki)
Date: Mon Jan 31 00:22:07 2005
Subject: [Wekalist] Information Gain
Message-ID: <BAY19-F1684E233336177D56470D0A97B0@phx.gbl>

Hello.

I am trying to use the Information Gain attribute selection through my java 
code.
How do I do this?

If I use for instance, DiscretizeFilter
private Filter m_Filter = new DiscretizeFilter();

before the new instance is added to the classsifier they are filtered:

m_Filter.inputFormat(m_Data);
Instances filteredData = Filter.useFilter(m_Data, m_Filter);

and the filtered data are used to udate the classifer
m_Classifier.buildClassifier(filteredData);

Just like the basic tutorial on Weka describes in the messageClassifier.java

Now I have to use Infrormation Gain Filter. Is this used on each instance as 
the previous? I think it is used after the classifier is build. Then it 
evaluates the attributes and rebuilds the classifier again??
Generally, I lost.. :<<
I will keep seaching but meanwhile i thought to make this post so as to ask 
how to use the inforamtion gain in my java code. I am working on text 
classification problem. I have 1000 attributes, the most frequent words of a 
folder with already categoried text.

Thanks for your consideration.
Maria.

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/


From pythonner at gmail.com  Mon Jan 31 04:28:40 2005
From: pythonner at gmail.com (David)
Date: Mon Jan 31 04:28:52 2005
Subject: [Wekalist] Information Gain
In-Reply-To: <BAY19-F1684E233336177D56470D0A97B0@phx.gbl>
References: <BAY19-F1684E233336177D56470D0A97B0@phx.gbl>
Message-ID: <a39a667005013007281c3aaac3@mail.gmail.com>

Hello,

Do something like:

---

Instances insts = m_DummyLearner.GetTrainInstances();

ASEvaluation eval = new InfoGainAttributeEval();
ASSearch search = new Ranker();;
((Ranker)search).setNumToSelect(NUM_TO_SELECT);

m_AttributeSelection = new AttributeSelection();
m_AttributeSelection.setEvaluator(eval);
m_AttributeSelection.setSearch(search);
m_AttributeSelection.SelectAttributes(insts);

---

m_AttributeSelection.selectedAttributes() returns the indexes of the
chosen attributes..

good luck



On Sun, 30 Jan 2005 13:21:45 +0200, Maria Tsiakmaki <kasiopi@hotmail.com> wrote:
> Hello.
> 
> I am trying to use the Information Gain attribute selection through my java
> code.
> How do I do this?
> 
> If I use for instance, DiscretizeFilter
> private Filter m_Filter = new DiscretizeFilter();
> 
> before the new instance is added to the classsifier they are filtered:
> 
> m_Filter.inputFormat(m_Data);
> Instances filteredData = Filter.useFilter(m_Data, m_Filter);
> 
> and the filtered data are used to udate the classifer
> m_Classifier.buildClassifier(filteredData);
> 
> Just like the basic tutorial on Weka describes in the messageClassifier.java
> 
> Now I have to use Infrormation Gain Filter. Is this used on each instance as
> the previous? I think it is used after the classifier is build. Then it
> evaluates the attributes and rebuilds the classifier again??
> Generally, I lost.. :<<
> I will keep seaching but meanwhile i thought to make this post so as to ask
> how to use the inforamtion gain in my java code. I am working on text
> classification problem. I have 1000 attributes, the most frequent words of a
> folder with already categoried text.
> 
> Thanks for your consideration.
> Maria.
> 
> _________________________________________________________________
> FREE pop-up blocking with the new MSN Toolbar - get it now!
> http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
> 
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> 


-- 
Balie - Baseline Information Extraction
http://balie.sourceforge.net
[Open Source ~ 100% Java ~ Using Weka ~ Multilingual]

From jcrowell at rics.bwh.harvard.edu  Mon Jan 31 10:29:03 2005
From: jcrowell at rics.bwh.harvard.edu (jcrowell)
Date: Mon Jan 31 10:25:58 2005
Subject: [Wekalist] newbie question about attributes, instances,
	and classifiers
Message-ID: <000901c50712$b8b17310$1f35ae86@frodo>

Hello all,

I am working through the example of how to interact with WEKA
programmatically in chapter 8 of the book and am adapting it to use a Na?ve
Bayes classifier.

The example relies on a hard-coded list of attributes which are described in
the comments as "Our (rather arbitrary) set of keywords".

My question: Why do I have to provide a set of keywords?  Isn't it the job
of the na?ve Bayes classifier to take the training data I have, which is
already sorted into "hit" and "miss" categories, and compute what the most
telling attributes are?

Here is how I understand how a classifier should work: I should be able to
feed it a bunch of files and for each one say "this category" or "that
category" and then, when I have fed it enough to train it, I should be able
to send it a new file and it should try and predict the best category for
me.  I don't understand where the list of user-provided keywords comes into
play.

Thanks to anyone who can help me with this.  Sorry for being a newbie.

Jon Crowell
Software Engineer
Decision Systems Group


From ihok at hotmail.com  Mon Jan 31 10:59:28 2005
From: ihok at hotmail.com (Jack Tanner)
Date: Mon Jan 31 11:00:12 2005
Subject: [Wekalist] learning under uncertainty
Message-ID: <BAY102-F889BD1953AA64FA83F3C2CA7B0@phx.gbl>

I apologize that this is more of a generic ML question than a WEKA specific 
one...

I'd like to learn from a data set where I don't have consistent confidence 
in all of the instances in my training set. The variability in confidence is 
a result of how the data were collected.

I'd like to weigh each instance by my confidence in it. Is there a way of 
expressing this problem within WEKA, or any other ML software? Any 
algorithms are fine, but I'm especially interested in those that can also 
handle bag of words feature vectors.

Also, I may also need to evaluate on a test set where I don't have 
consistent confidence in the test instances. Can I do a similar kind of 
weighting?

Thanks in advance for your advice.



From cplyon928 at comcast.net  Mon Jan 31 16:55:29 2005
From: cplyon928 at comcast.net (Clifford Lyon)
Date: Mon Jan 31 16:55:46 2005
Subject: [Wekalist] Independent Component Analysis
In-Reply-To: <20040204071040.23587.qmail@web13009.mail.yahoo.com>
References: <20040204071040.23587.qmail@web13009.mail.yahoo.com>
Message-ID: <41FDAC31.7040101@comcast.net>

I thought some mentioned translating the matlab fastica version to Weka 
using Jama, but I can't find the post now.  Does anyone know, did 
fastica make it to Weka?

Thanks

Michael Dell Junior wrote:
> Hi there,
>  
> I would like to know whether someone has written/developed an 
> Independent Component Analysis Module For Weka.
>  
>  
>  
> Best,
>  
> Michael
> 
> 
> ------------------------------------------------------------------------
> Post your free ad now! *Yahoo! Canada Personals* 
> <http://ca.personals.yahoo.com/>
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist


